{"id":142725,"date":"2020-12-28T14:00:25","date_gmt":"2020-12-28T11:00:25","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/why-web-scraping-is-vital-to-democracy\/"},"modified":"2020-12-28T14:00:25","modified_gmt":"2020-12-28T11:00:25","slug":"why-web-scraping-is-vital-to-democracy","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/why-web-scraping-is-vital-to-democracy\/","title":{"rendered":"#Why web scraping is vital to democracy"},"content":{"rendered":"<p>&#8220;<strong>#Why web scraping is vital to democracy<\/strong>&#8221;<br \/>\n<img decoding=\"async\" src=\"https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/12\/1-copy-30-796x417.jpg\" \/><\/p>\n<div>\n<p>The fruits of web scraping \u2014 using code to harvest data and information from websites \u2014 are all around us.<\/p>\n<p>People build scrapers that can <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/github.com\/alltheplaces\/alltheplaces\/tree\/master\/locations\/spiders\">find every Applebee\u2019s on the planet<\/a> or <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/github.com\/unitedstates\/congress\">collect congressional legislation and votes<\/a> or <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.watchpatrol.net\">track fancy watches for sale<\/a> on fan websites. Businesses use scrapers to <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/service.octoparse.com\/inventory-web-scraping-blind-rivet-supply\">manage their online retail inventory<\/a> and monitor<a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/datahut.co\/solutions\/\"> competitors\u2019 prices<\/a>. Lots of well-known sites use scrapers to do things like <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.skyscanner.com\">track airline ticket prices<\/a> and <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.careerbuilder.com\">job listings<\/a>. Google is essentially a giant, crawling web scraper.<\/p>\n<p>Scrapers are also the tools of watchdogs and journalists, which is why The Markup filed an <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.supremecourt.gov\/DocketPDF\/19\/19-783\/147271\/20200708180752488_19-783%20-%20the%20markup%20amicus%20brief%20for%20e-filing%207-8-2020.pdf\">amicus brief<\/a> in a case before the U.S. Supreme Court this week that threatens to make scraping illegal.<\/p>\n<p>The case itself\u2014<a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.supremecourt.gov\/search.aspx?filename=\/docket\/docketfiles\/html\/public\/19-783.html\"><em>Van Buren v. United States<\/em><\/a>\u2014is not about scraping but rather a legal question regarding the prosecution of a Georgia police officer, Nathan Van Buren, who was bribed to look up confidential information in a law enforcement database. Van Buren was prosecuted under the Computer Fraud and Abuse Act (CFAA), which prohibits unauthorized access to a computer network such as computer hacking, where someone breaks into a system to steal information (or, as dramatized in the 1980s classic movie \u201c<a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.imdb.com\/title\/tt0086567\/\">WarGames<\/a>,\u201d potentially start World War\u00a0III).<\/p>\n<p>In Van Buren\u2019s case, since he was allowed to access the database for work, the question is whether the court will broadly define his troubling activities as \u201cexceeding authorized access\u201d to extract data, which is what would make it a crime under the CFAA. And it\u2019s that definition that could affect journalists.<\/p>\n<p>Or, as Justice Neil Gorsuch put it during Monday\u2019s oral arguments, lead in the direction of \u201cperhaps making a federal criminal of us all.\u201d<\/p>\n<p>Investigative journalists and other watchdogs often use scrapers to illuminate issues big and small, from <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/manolo.rocks\">tracking the influence of lobbyists in Peru<\/a> by harvesting the digital visitor logs for government buildings to <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/adobservatory.org\">monitoring and collecting<\/a> political ads on <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/social-mediaa\/\" data-internallinksmanager029f6b8e52c=\"1\" title=\"Social Media\" target=\"_blank\" rel=\"noopener\">Facebook<\/a>. In both of those instances, the pages and data scraped are publicly available on the internet\u2014no hacking necessary\u2014but sites involved could easily change the fine print on their terms of service to label the aggregation of that information \u201cunauthorized.\u201d And the U.S. Supreme Court, depending on how it rules, could decide that violating those terms of service is a crime under the CFAA.<\/p>\n<p>\u201cA statute that allows powerful forces like the government or wealthy corporate actors to unilaterally criminalize <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/news\/\" data-internallinksmanager029f6b8e52c=\"2\" title=\"News\" target=\"_blank\" rel=\"noopener\">news<\/a>gathering activities by blocking these efforts through the terms of service for their websites would violate the First Amendment,\u201d The Markup wrote in our brief.<\/p>\n<p>What sort of work is at risk? Here\u2019s a roundup of some recent journalism made possible by web scraping:<\/p>\n<ul>\n<li>The <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/covidtracking.com\">COVID tracking project<\/a>, from The Atlantic, collects and aggregates data from around the country on a daily basis, serving as a means of monitoring where testing is h<a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>ening, where the pandemic is growing, and the racial disparities in who\u2019s contracting and dying from the virus.<\/li>\n<li>This <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/revealnews.org\/topic\/to-protect-and-slur\/\">project<\/a>, from Reveal, scraped extremist Facebook groups and compared their membership rolls to those of law enforcement groups on Facebook\u2014and found a lot of overlap.<\/li>\n<\/ul>\n<ul>\n<li>The Markup\u2019s recent investigation into Google\u2019s search results found that it consistently <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/themarkup.org\/google-the-giant\/2020\/07\/28\/google-search-results-prioritize-google-products-over-competitors\">favors its own products<\/a>, leaving some websites from which the web giant itself scrapes information struggling for visitors and, therefore, ad revenue. The U.S. Department of Justice <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/themarkup.org\/google-the-giant\/2020\/10\/20\/google-antitrust-lawsuit-markup-investigations\">cited the issue<\/a> in an antitrust lawsuit against the company.<\/li>\n<\/ul>\n<ul>\n<li>In <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.usatoday.com\/pages\/interactives\/asbestos-sharia-law-model-bills-lobbyists-special-interests-influence-state-laws\/\">Copy, Paste, Legislate<\/a>, USA Today found a pattern of cookie-cutter laws, pushed by special interest groups, circulating in legislatures around the country.<\/li>\n<\/ul>\n<p><em>This article was <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/themarkup.org\/news\/2020\/12\/03\/why-web-scraping-is-vital-to-democracy\">originally published on The Markup<\/a> and was republished under the <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/\">Creative Commons Attribution-NonCommercial-NoDerivatives<\/a><a rel=\"nofollow noopener\" target=\"_blank\"> license.<\/a><\/em><\/p>\n<\/p><\/div>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMLG0nwswvr63Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\">For forums sites go to <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/forum.buradabiliyorum.com\/\" target=\"_blank\" rel=\"noopener\">Forum.BuradaBiliyorum.Com<\/a><\/span><\/strong>\n<\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/technology\/\" target=\"_blank\" rel=\"noopener\">Technology category.<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/thenextweb.com\/syndication\/2020\/12\/28\/why-web-scraping-is-vital-to-democracy\/\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;#Why web scraping is vital to democracy&#8221; The fruits of web scraping \u2014 using code to harvest data and information from websites \u2014 are all around us. People build scrapers that can find every Applebee\u2019s on the planet or collect congressional legislation and votes or track fancy watches for sale on fan websites. Businesses use&#8230;<\/p>\n","protected":false},"author":1,"featured_media":142726,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/img-cdn.tnwcdn.com\/image\/tnw?filter_last=1&fit=1280,640&url=https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2020\/12\/1-copy-30.jpg&signature=15d4c08c2a2cc15cbe11497d6c6dc423","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[72366,71916,70228,70759,87985,73708],"class_list":["post-142725","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology","tag-data","tag-journalist","tag-law-enforcement","tag-tech","tag-terms-of-service","tag-web-scraping"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/142725","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=142725"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/142725\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/142726"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=142725"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=142725"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=142725"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}