{"id":505878,"date":"2022-11-03T03:48:37","date_gmt":"2022-11-03T00:48:37","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/how-to-debug-kubernetes-failedscheduling-errors\/"},"modified":"2022-11-03T03:48:37","modified_gmt":"2022-11-03T00:48:37","slug":"how-to-debug-kubernetes-failedscheduling-errors","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/how-to-debug-kubernetes-failedscheduling-errors\/","title":{"rendered":"#How to Debug Kubernetes \u201cFailedScheduling\u201d Errors"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_84 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-6a27c18f8131b\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #dd3333;color:#dd3333\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #dd3333;color:#dd3333\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a27c18f8131b\" checked aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-1'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/buradabiliyorum.com\/en\/how-to-debug-kubernetes-failedscheduling-errors\/#%E2%80%9CHow_to_Debug_Kubernetes_%E2%80%9CFailedScheduling%E2%80%9D_Errors%E2%80%9D\" >&#8220;How to Debug Kubernetes \u201cFailedScheduling\u201d Errors&#8221;<\/a><ul class='ez-toc-list-level-2' ><li class='ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/buradabiliyorum.com\/en\/how-to-debug-kubernetes-failedscheduling-errors\/#Identifying_a_FailedScheduling_Error\" >Identifying a FailedScheduling Error<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/buradabiliyorum.com\/en\/how-to-debug-kubernetes-failedscheduling-errors\/#Understanding_FailedScheduling_Errors_and_Similar_Problems\" >Understanding FailedScheduling Errors and Similar Problems<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/buradabiliyorum.com\/en\/how-to-debug-kubernetes-failedscheduling-errors\/#Resolving_the_FailedScheduling_State\" >Resolving the FailedScheduling State<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/buradabiliyorum.com\/en\/how-to-debug-kubernetes-failedscheduling-errors\/#Summary\" >Summary<\/a><\/li><\/ul><\/li><\/ul><\/nav><\/div>\n<h1><span class=\"ez-toc-section\" id=\"%E2%80%9CHow_to_Debug_Kubernetes_%E2%80%9CFailedScheduling%E2%80%9D_Errors%E2%80%9D\"><\/span>&#8220;How to Debug Kubernetes \u201cFailedScheduling\u201d Errors&#8221;<span class=\"ez-toc-section-end\"><\/span><\/h1>\n<div>\n<img loading=\"lazy\" decoding=\"async\" class=\"type:primaryImage alignnone size-full wp-image-806255\" data-pagespeed-no-defer=\"\" src=\"https:\/\/www.howtogeek.com\/wp-content\/uploads\/2022\/05\/Kubernetes-New.jpg?width=1198&amp;trim=1,1&amp;bg-color=000&amp;pad=1,1\" alt=\"Graphic with the Kubernetes logo\" width=\"1202\" height=\"677\"\/><\/p>\n<p>Pod scheduling issues are one of the most common Kubernetes errors. There are several reasons why a new Pod can get stuck in a <code>Pending<\/code> state with <code>FailedScheduling<\/code> as its reason. A Pod that displays this status won\u2019t start any containers so you\u2019ll be unable to use your <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>lication.<\/p>\n<p>Pending Pods caused by scheduling problems don\u2019t normally start running without some manual intervention. You\u2019ll need to investigate the root cause and take action to fix your cluster. In this article, you\u2019ll learn how to diagnose and resolve this problem so you can bring your workloads up.<\/p>\n<h2 id=\"identifying-a-failedscheduling-error\"><span class=\"ez-toc-section\" id=\"Identifying_a_FailedScheduling_Error\"><\/span>Identifying a FailedScheduling Error<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>It\u2019s normal for Pods to show a <code>Pending<\/code> status for a short period after you add them to your cluster. Kubernetes needs to schedule container instances to your Nodes and those Nodes have to pull the image from its registry. The first sign that a Pod\u2019s failed scheduling is when it still shows as <code>Pending<\/code> after the usual startup period has elapsed. You can check the status by running Kubectl\u2019s <code>get pods<\/code> command:<\/p>\n<pre>$ kubectl get pods&#13;\n&#13;\nNAME        READY   STATUS      RESTARTS    AGE&#13;\ndemo-pod    0\/1     Pending     0           4m05s<\/pre>\n<p><code>demo-pod<\/code> is over four minutes old but it\u2019s still in the <code>Pending<\/code> state. Pods don\u2019t usually take this long to start containers so it\u2019s time to start investigating what Kubernetes is waiting for.<\/p>\n<p>The next diagnosis step is to retrieve the Pod\u2019s event history using the <code>describe pod<\/code> command:<\/p>\n<pre>$ kubectl describe pod demo-pod&#13;\n&#13;\n...&#13;\nEvents:&#13;\n  Type     Reason            Age       From               Message&#13;\n  ----     ------            ----      ----               -------&#13;\n  ...&#13;\n  Warning  FailedScheduling  4m        default-scheduler  0\/4 nodes are available: 1 Too many pods, 3 Insufficient cpu.<\/pre>\n<p>The event history confirms a <code>FailedScheduling<\/code> error is the reason for the prolonged <code>Pending<\/code> state. This event is reported when Kubernetes can\u2019t allocate the required number of Pods to any of the worker nodes in your cluster.<\/p>\n<p>The event\u2019s message reveals why scheduling is currently impossible: there are four nodes in the cluster but none of them can take the Pod. Three of the nodes have insufficient CPU capacity while the other has reached a cap on the number of Pods it can accept.<\/p>\n<h2 id=\"understanding-failedscheduling-errors-and-similar-problems\"><span class=\"ez-toc-section\" id=\"Understanding_FailedScheduling_Errors_and_Similar_Problems\"><\/span>Understanding FailedScheduling Errors and Similar Problems<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Kubernetes can only schedule Pods onto nodes that have spare resources available. Nodes with exhausted CPU or memory capacity can\u2019t take any more Pods. Pods can also fail scheduling if they explicitly request more resources than any node can provide. This maintains your cluster\u2019s stability.<\/p>\n<p>The Kubernetes control plane is aware of the Pods already allocated to the nodes in your cluster. It uses this information to determine the set of nodes that can receive a new Pod. A scheduling error results when there\u2019s no candidates available, leaving the Pod stuck <code>Pending<\/code> until capacity is freed up.<\/p>\n<p>Kubernetes can fail to schedule Pods for other reasons too. There are several ways in which nodes can be deemed ineligible to host a Pod, despite having adequate system resources:<\/p>\n<ul>\n<li>The node might have been cordoned by an administrator to stop it receiving new Pods ahead of a maintenance operation.<\/li>\n<li>The node could be <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/kubernetes.io\/docs\/concepts\/scheduling-eviction\/taint-and-toleration\">tainted<\/a> with an effect that prevents Pods from scheduling. Your Pod won\u2019t be accepted by the node unless it has a corresponding toleration.<\/li>\n<li>Your Pod might be <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/kubernetes.io\/docs\/concepts\/configuration\/overview\">requesting a <code>hostPort<\/code><\/a> which is already bound on the node. Nodes can only provide a particular port number to a single Pod at a time.<\/li>\n<li>Your Pod could be using <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/kubernetes.io\/docs\/concepts\/scheduling-eviction\/assign-pod-node\">a <code>nodeSelector<\/code><\/a> that means it has to be scheduled to a node with a particular label. Nodes that lack the label won\u2019t be eligible.<\/li>\n<li>Pod and Node affinities and anti-affinities might be unsatisfiable, causing a scheduling conflict that prevents new Pods from being accepted.<\/li>\n<li>The Pod might have <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/kubernetes.io\/docs\/concepts\/scheduling-eviction\/assign-pod-node\/#nodename\">a <code>nodeName<\/code> field<\/a> that identifies a specific node to schedule to. The Pod will be stuck pending if that node is offline or unschedulable.<\/li>\n<\/ul>\n<p>It\u2019s the responsibility of <code>kube-scheduler<\/code>, the <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/kubernetes.io\/docs\/concepts\/scheduling-eviction\/kube-scheduler\">Kubernetes scheduler<\/a>, to work through these conditions and identify the set of nodes that can take a new Pod. A <code>FailedScheduling<\/code> event occurs when none of the nodes satisfy the criteria.<\/p>\n<h2 id=\"resolving-the-failedscheduling-state\"><span class=\"ez-toc-section\" id=\"Resolving_the_FailedScheduling_State\"><\/span>Resolving the FailedScheduling State<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The message displayed next to <code>FailedScheduling<\/code> events usually reveals why each node in your cluster was unable to take the Pod. You can use this information to start addressing the problem. In the example shown above, the cluster had four Pods, three where the CPU limit had been reached, and one that had exceeded a Pod count limit.<\/p>\n<p>Cluster capacity is the root cause in this case. You can scale your cluster with new nodes to resolve hardware consumption problems, adding resources that will provide extra flexibility. As this will also raise your costs, it\u2019s worthwhile checking whether you\u2019ve got any redundant Pods in your cluster first. Deleting unused resources will free up capacity for new ones.<\/p>\n<p>You can inspect the available resources on each of your nodes using the <code>describe node<\/code> command:<\/p>\n<pre>$ kubectl describe node demo-node&#13;\n&#13;\n...&#13;\nAllocated resources:&#13;\n  (Total limits may be over 100 percent, i.e., overcommitted.)&#13;\n  Resource           Requests     Limits&#13;\n  --------           --------     ------&#13;\n  cpu                812m (90%)   202m (22%)&#13;\n  memory             905Mi (57%)  715Mi (45%)&#13;\n  ephemeral-storage  0 (0%)       0 (0%)&#13;\n  hugepages-2Mi      0 (0%)       0 (0%)<\/pre>\n<p>Pods on this node are already requesting 57% of the available memory. If a new Pod requested 1 Gi for itself then the node would be unable to accept the scheduling request. Monitoring this information for each of your nodes can help you assess whether your cluster is becoming over-provisioned. It\u2019s important to have spare capacity available in case one of your nodes becomes unhealthy and its workloads have to be rescheduled to another.<\/p>\n<p>Scheduling failures due to there being no schedulable nodes will show a message similar to the following in the <code>FailedScheduling<\/code> event:<\/p>\n<pre>0\/4 nodes are available: 4 node(s) were unschedulable<\/pre>\n<p>Nodes that are unschedulable because they\u2019ve been cordoned will include <code>SchedulingDisabled<\/code> in their status field:<\/p>\n<pre>$ kubectl get nodes&#13;\nNAME       STATUS                     ROLES                  AGE   VERSION&#13;\nnode-1     Ready,SchedulingDisabled   control-plane,master   26m   v1.23.3<\/pre>\n<p>You can uncordon the node to allow it to receive new Pods:<\/p>\n<pre>$ kubectl uncordon node-1&#13;\nnode\/node-1 uncordoned<\/pre>\n<p>When nodes aren\u2019t cordoned and have sufficient resources, scheduling errors are normally caused by tainting or an incorrect <code>nodeSelector<\/code> field on your Pod. If you\u2019re <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/kubernetes.io\/docs\/concepts\/scheduling-eviction\/assign-pod-node\/#nodeselector\">using <code>nodeSelector<\/code><\/a>, check you haven\u2019t made a typo and that there are Pods in your cluster that have the labels you\u2019ve specified.<\/p>\n<p>When nodes are tainted, make sure you\u2019ve included the corresponding toleration in your Pod\u2019s manifest. As an example, here\u2019s a node that\u2019s been tainted so Pods don\u2019t schedule unless they have a <code>demo-taint: allow<\/code> toleration:<\/p>\n<pre>$ kubectl taint nodes node-1 demo-taint=allow:NoSchedule<\/pre>\n<p>Modify your Pod manifests so they can schedule onto the Node:<\/p>\n<div class=\"wp-geshi-highlight-wrap5\">\n<div class=\"wp-geshi-highlight-wrap4\">\n<div class=\"wp-geshi-highlight-wrap3\">\n<div class=\"wp-geshi-highlight-wrap2\">\n<div class=\"wp-geshi-highlight-wrap\">\n<div class=\"wp-geshi-highlight\">\n<div class=\"yaml\">\n<pre class=\"de1\"><strong class=\"co4\">spec<\/strong>:<strong class=\"co4\">\n  tolerations<\/strong>:<strong class=\"co3\">\n    - key<\/strong><strong class=\"sy2\">: <\/strong>demo-taint<strong class=\"co3\">\n      operator<\/strong><strong class=\"sy2\">: <\/strong>Equal<strong class=\"co3\">\n      value<\/strong><strong class=\"sy2\">: <\/strong>allow<strong class=\"co3\">\n      effect<\/strong><strong class=\"sy2\">: <\/strong>NoSchedule<\/pre>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>Resolving the problem that caused the <code>FailedScheduling<\/code> state will allow Kubernetes to resume scheduling your pending Pods. They\u2019ll start running automatically shortly after the control plane detects the changes to your nodes. You don\u2019t need to manually restart or recreate your Pods, unless the issue\u2019s due to mistakes in your Pod\u2019s manifest such as incorrect affinity or <code>nodeSelector<\/code> fields.<\/p>\n<h2 id=\"summary\"><span class=\"ez-toc-section\" id=\"Summary\"><\/span>Summary<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><code>FailedScheduling<\/code> errors occur when Kubernetes can\u2019t place a new Pod onto any node in your cluster. This is often because your existing nodes are running low on hardware resources such as CPU, memory, and disk. When this is the case, you can resolve the problem by scaling your cluster to include <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/kubernetes.io\/docs\/concepts\/architecture\/nodes\">additional nodes<\/a>.<\/p>\n<p>Scheduling failures also arise when Pods specify affinities, anti-affinities, and node selectors that can\u2019t currently be satisfied by the nodes available in your cluster. Cordoned and tainted nodes further reduce the options available to Kubernetes. This kind of issue can be addressed by checking your manifests for typos in labels and removing constraints you no longer need.<\/p>\n<\/div>\n<p><script>\n setTimeout(function(){\n  !function(f,b,e,v,n,t,s)\n  {if(f.fbq)return;n=f.fbq=function(){n.callMethod?\n  n.callMethod.apply(n,arguments):n.queue.push(arguments)};\n  if(!f._fbq)f._fbq=n;n.push=n;n.loaded=!0;n.version='2.0';\n  n.queue=[];t=b.createElement(e);t.async=!0;\n  t.src=v;s=b.getElementsByTagName(e)[0];\n  s.parentNode.insertBefore(t,s) } (window, document,'script',\n  'https:\/\/connect.facebook.net\/en_US\/fbevents.js');\n   fbq('init', '335401813750447');\n   fbq('track', 'PageView');\n  },3000);\n<\/script><\/p>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMLG0nwswvr63Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\">For forums sites go to <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/forum.buradabiliyorum.com\/\" target=\"_blank\" rel=\"noopener\">Forum.BuradaBiliyorum.Com<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/technology\/\" target=\"_blank\" rel=\"noopener\">Technology category.<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/www.howtogeek.com\/devops\/how-to-debug-kubernetes-failedscheduling-errors\/\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;How to Debug Kubernetes \u201cFailedScheduling\u201d Errors&#8221; Pod scheduling issues are one of the most common Kubernetes errors. There are several reasons why a new Pod can get stuck in a Pending state with FailedScheduling as its reason. A Pod that displays this status won\u2019t start any containers so you\u2019ll be unable to use your application&#8230;.<\/p>\n","protected":false},"author":1,"featured_media":505879,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/www.howtogeek.com\/wp-content\/uploads\/2022\/05\/Kubernetes-New.jpg?height=200p&trim=2,2,2,2","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[],"class_list":["post-505878","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/505878","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=505878"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/505878\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/505879"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=505878"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=505878"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=505878"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}