{"id":295688,"date":"2021-07-09T16:30:08","date_gmt":"2021-07-09T13:30:08","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/what-is-a-web-crawler-and-how-does-it-work\/"},"modified":"2021-07-09T16:30:08","modified_gmt":"2021-07-09T13:30:08","slug":"what-is-a-web-crawler-and-how-does-it-work","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/what-is-a-web-crawler-and-how-does-it-work\/","title":{"rendered":"#What Is a Web Crawler, and How Does It Work?"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-6a41b5efa9878\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #dd3333;color:#dd3333\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #dd3333;color:#dd3333\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a41b5efa9878\" checked aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/buradabiliyorum.com\/en\/what-is-a-web-crawler-and-how-does-it-work\/#Search_Engines_and_Crawlers\" >Search Engines and Crawlers<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/buradabiliyorum.com\/en\/what-is-a-web-crawler-and-how-does-it-work\/#Site_Maps_and_Selection\" >Site Maps and Selection<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/buradabiliyorum.com\/en\/what-is-a-web-crawler-and-how-does-it-work\/#Robots_and_the_Politeness_Factor\" >Robots and the Politeness Factor<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/buradabiliyorum.com\/en\/what-is-a-web-crawler-and-how-does-it-work\/#Metadata_Magic\" >Metadata Magic<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/buradabiliyorum.com\/en\/what-is-a-web-crawler-and-how-does-it-work\/#Your_Searching\" >Your Searching<\/a><\/li><\/ul><\/nav><\/div>\n<p><strong>&#8220;#What Is a Web Crawler, and How Does It Work?&#8221;<\/strong><\/p>\n<div>\n<figure style=\"width: 1200px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"type:primaryImage wp-image-732214 size-full\" src=\"https:\/\/www.howtogeek.com\/wp-content\/uploads\/2021\/05\/web_crawler_header.jpg?width=1198&amp;trim=1,1&amp;bg-color=000&amp;pad=1,1\" alt=\"A spider made out of ones and zeroes.\" width=\"1200\" height=\"675\" onload=\"pagespeed.lazyLoadImages.loadIfVisibleAndMaybeBeacon(this);\" onerror=\"this.onerror=null;pagespeed.lazyLoadImages.loadIfVisibleAndMaybeBeacon(this);\"\/><figcaption class=\"wp-caption-text\"><span class=\"type:primaryImage imagecredit\"><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.shutterstock.com\/image-illustration\/spider-digital-background-concept-web-crawler-347739635\">Enzozo \/ Shutterstock<\/a><\/span><\/figcaption><\/figure>\n<p>Have you ever searched for something on Google and wondered, \u201cHow does it know where to look?\u201d The answer is \u201cweb crawlers,\u201d which search the web and index it so that you can find things easily online. We\u2019ll explain.<\/p>\n<h2 role=\"heading\" aria-level=\"2\"><span class=\"ez-toc-section\" id=\"Search_Engines_and_Crawlers\"><\/span>Search Engines and Crawlers<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>When you search using a keyword on a search engine like Google or Bing, the site sifts through trillions of pages to generate a list of results related to that term. How exactly do these search engines have all of these pages on file, know how to look for them, and generate these results within seconds?<\/p>\n<p>The answer is web crawlers, also known as spiders. These are automated programs (often called \u201crobots\u201d or \u201cbots\u201d) that \u201ccrawl\u201d or browse across the web so that they can be added to search engines. These robots index websites to create a list of pages that eventually <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>ear in your search results.<\/p>\n<p>Crawlers also create and store copies of these pages in the engine\u2019s database, which allows you to make searches almost instantly. It\u2019s also the reason why search engines often include cached versions of sites in their databases.<\/p>\n<p><strong>RELATED:<\/strong> <strong><em>How to Access a Web Page When It&#8217;s Down<\/em><\/strong><\/p>\n<h2 role=\"heading\" aria-level=\"2\"><span class=\"ez-toc-section\" id=\"Site_Maps_and_Selection\"><\/span>Site Maps and Selection<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<figure style=\"width: 650px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-732213 size-full\" src=\"https:\/\/www.howtogeek.com\/wp-content\/uploads\/2021\/05\/man_with_charts.jpg?trim=1,1&amp;bg-color=000&amp;pad=1,1\" alt=\"An illustration of a man in front of a flowchart.\" width=\"650\" height=\"478\" onload=\"pagespeed.lazyLoadImages.loadIfVisibleAndMaybeBeacon(this);\" onerror=\"this.onerror=null;pagespeed.lazyLoadImages.loadIfVisibleAndMaybeBeacon(this);\"\/><figcaption class=\"wp-caption-text\"><span class=\"imagecredit\"><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.shutterstock.com\/image-vector\/using-website-flowchart-sitemap-connecting-working-245428414\">Griboedov \/ Shutterstock<\/a><\/span><\/figcaption><\/figure>\n<p>So, how do crawlers pick which websites to crawl? Well, the most common scenario is that website owners want search engines to crawl their sites. They can achieve this by requesting Google, Bing, Yahoo, or another search engine to index their pages. This process varies from engine to engine. Also, search engines frequently select popular, well-linked websites to crawl\u00a0by tracking the number of times that a URL is linked on other public sites.<\/p>\n<p>Website owners can use certain processes to help search engines index their websites, such as<br \/>uploading a site map. This is a file containing all the links and pages that are part of your website. It\u2019s normally used to indicate what pages you\u2019d like indexed.<\/p>\n<p>Once search engines have already crawled a website once, they will automatically crawl that site again. The frequency varies based on how popular a website is, among other metrics. Therefore, site owners frequently keep updated site maps to let engines know which new websites to index.<\/p>\n<h2 role=\"heading\" aria-level=\"2\"><span class=\"ez-toc-section\" id=\"Robots_and_the_Politeness_Factor\"><\/span>Robots and the Politeness Factor<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<figure style=\"width: 650px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-732212 size-full\" src=\"https:\/\/www.howtogeek.com\/wp-content\/uploads\/2021\/05\/robotstxt.jpg?trim=1,1&amp;bg-color=000&amp;pad=1,1\" alt=\"\" width=\"650\" height=\"327\" onload=\"pagespeed.lazyLoadImages.loadIfVisibleAndMaybeBeacon(this);\" onerror=\"this.onerror=null;pagespeed.lazyLoadImages.loadIfVisibleAndMaybeBeacon(this);\"\/><figcaption class=\"wp-caption-text\"><span class=\"imagecredit\"><a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.shutterstock.com\/image-photo\/word-robots-formed-by-wooden-blocks-1187692504\">Devenorr \/ Shutterstock<\/a><\/span><\/figcaption><\/figure>\n<p>What if a website\u00a0<em>doesn\u2019t\u00a0<\/em>want some or all of its pages to appear on a search engine? For example, you might not want people to search for a members-only page or see your 404 error page. This is where the crawl exclusion list, also known as robots.txt, comes into play. This is a simple text file that dictates to crawlers which web pages to exclude from indexing.<\/p>\n<p>Another reason why robots.txt is important is that web crawlers can have a significant effect on site performance. Because crawlers are essentially downloading all the pages on your website, they consume resources and can cause slowdowns. They arrive at unpredictable times and\u00a0without approval. If you don\u2019t need your pages indexed repeatedly, then stopping crawlers might help reduce some of your website load. Fortunately, most crawlers\u00a0stop crawling certain pages based on the rules of the site owner.<\/p>\n<h2 role=\"heading\" aria-level=\"2\"><span class=\"ez-toc-section\" id=\"Metadata_Magic\"><\/span>Metadata Magic<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-731931\" src=\"https:\/\/www.howtogeek.com\/wp-content\/uploads\/2021\/05\/HowToGeek-Search-Results.png?trim=1,1&amp;bg-color=000&amp;pad=1,1\" alt=\"Google Search HowToGeek\" width=\"650\" height=\"350\" onload=\"pagespeed.lazyLoadImages.loadIfVisibleAndMaybeBeacon(this);\" onerror=\"this.onerror=null;pagespeed.lazyLoadImages.loadIfVisibleAndMaybeBeacon(this);\"\/><\/p>\n<p>Under the URL and title of every search result in Google, you will find a short description of the page. These descriptions are called snippets. You might notice that the snippet of a page in Google doesn\u2019t always line up with the website\u2019s actual content. This is because many websites have something called \u201c<a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/developers.google.com\/search\/docs\/advanced\/appearance\/good-titles-snippets\">meta tags<\/a>,\u201d which are custom descriptions that site owners add to their pages.<\/p>\n<p>Site owners often come up with enticing metadata descriptions written to make you want to click on a website. Google also lists other meta-information, such as prices and stock availability. This is especially useful for those running e-commerce websites.<\/p>\n<h2 role=\"heading\" aria-level=\"2\"><span class=\"ez-toc-section\" id=\"Your_Searching\"><\/span>Your Searching<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Web searching is an essential part of using the internet. Searching the web is a great way to discover new websites, stores, communities, and interests.\u00a0Every day, web crawlers visit millions of pages and add them to search engines. While crawlers have some downsides, like taking up site resources, they\u2019re invaluable to both site owners and visitors.<\/p>\n<p><strong>RELATED:<\/strong> <strong><em>How to Delete the Last 15 Minutes of Google Search History<\/em><\/strong><\/p>\n<\/div>\n<p><script>\n setTimeout(function(){\n  !function(f,b,e,v,n,t,s)\n  {if(f.fbq)return;n=f.fbq=function(){n.callMethod?\n  n.callMethod.apply(n,arguments):n.queue.push(arguments)};\n  if(!f._fbq)f._fbq=n;n.push=n;n.loaded=!0;n.version='2.0';\n  n.queue=[];t=b.createElement(e);t.async=!0;\n  t.src=v;s=b.getElementsByTagName(e)[0];\n  s.parentNode.insertBefore(t,s) } (window, document,'script',\n  'https:\/\/connect.facebook.net\/en_US\/fbevents.js');\n   fbq('init', '335401813750447');\n   fbq('track', 'PageView');\n  },3000);\n<\/script><\/p>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMLG0nwswvr63Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\">For forums sites go to <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/forum.buradabiliyorum.com\/\" target=\"_blank\" rel=\"noopener\">Forum.BuradaBiliyorum.Com<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/technology\/\" target=\"_blank\" rel=\"noopener\">Technology category.<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/www.howtogeek.com\/731787\/what-is-a-web-crawler-and-how-does-it-work\/\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;#What Is a Web Crawler, and How Does It Work?&#8221; Enzozo \/ Shutterstock Have you ever searched for something on Google and wondered, \u201cHow does it know where to look?\u201d The answer is \u201cweb crawlers,\u201d which search the web and index it so that you can find things easily online. We\u2019ll explain. Search Engines and&#8230;<\/p>\n","protected":false},"author":1,"featured_media":295689,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/www.howtogeek.com\/wp-content\/uploads\/2021\/05\/web_crawler_header.jpg?height=200p&trim=2,2,2,2","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[],"class_list":["post-295688","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/295688","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=295688"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/295688\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/295689"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=295688"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=295688"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=295688"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}