{"id":332368,"date":"2021-08-30T17:05:00","date_gmt":"2021-08-30T14:05:00","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/excel-autocorrect-errors-are-plaguing-gene-research\/"},"modified":"2021-08-30T17:05:00","modified_gmt":"2021-08-30T14:05:00","slug":"excel-autocorrect-errors-are-plaguing-gene-research","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/excel-autocorrect-errors-are-plaguing-gene-research\/","title":{"rendered":"#Excel autocorrect errors are plaguing gene research"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_84 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-6a2d8c633c45e\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #dd3333;color:#dd3333\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #dd3333;color:#dd3333\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a2d8c633c45e\" checked aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/buradabiliyorum.com\/en\/excel-autocorrect-errors-are-plaguing-gene-research\/#Excel_makes_incorrect_assumptions\" >Excel makes incorrect assumptions<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/buradabiliyorum.com\/en\/excel-autocorrect-errors-are-plaguing-gene-research\/#An_ongoing_problem\" >An ongoing problem<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/buradabiliyorum.com\/en\/excel-autocorrect-errors-are-plaguing-gene-research\/#Small_errors_matter\" >Small errors matter<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/buradabiliyorum.com\/en\/excel-autocorrect-errors-are-plaguing-gene-research\/#Spreadsheet_catastrophes\" >Spreadsheet catastrophes<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/buradabiliyorum.com\/en\/excel-autocorrect-errors-are-plaguing-gene-research\/#Better_tools_are_available\" >Better tools are available<\/a><\/li><\/ul><\/nav><\/div>\n<p>&#8220;<strong>#Excel autocorrect errors are plaguing gene research<\/strong>&#8221;<\/p>\n<div>Auto-correction, or predictive text, is a common feature of many modern tech tools, from internet searches to messaging <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">app<\/a>s and word processors. Auto-correction can be a blessing, but when the algorithm makes mistakes it can change the message in dramatic and sometimes hilarious ways.<\/p>\n<p>Our research shows autocorrect errors, particularly in Excel spreadsheets, can also make a mess of gene names in genetic research. We surveyed more than 10,000 papers with Excel gene lists published between 2014 and 2020 and found <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/journals.plos.org\/ploscompbiol\/article?id=10.1371\/journal.pcbi.1008984\">more than 30%<\/a> contained at least one gene name mangled by autocorrect.<\/p>\n<p>This research follows our 2016 study that found <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/genomebiology.biomedcentral.com\/articles\/10.1186\/s13059-016-1044-7\">around 20%<\/a> of papers contained these errors, so the problem may be getting worse. We believe the lesson for researchers is clear: it\u2019s past time to stop using Excel and learn to use more powerful software.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Excel_makes_incorrect_assumptions\"><\/span>Excel makes incorrect assumptions<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Spreadsheets apply predictive text to guess what type of data the user wants. If you type in a phone number starting with zero, it will recognize it as a numeric value and remove the leading zero. If you type \u201c=8\/2\u201d, the result will appear as \u201c4\u201d, but if you type \u201c8\/2\u201d it will be recognized as a date.<\/p>\n<p>With scientific data, the simple act of opening a file in Excel with the default settings can corrupt the data due to auto-correction. It\u2019s possible to avoid unwanted auto-correction if cells are pre-formatted prior to pasting or importing data, but this and other data hygiene tips aren\u2019t widely practiced.<\/p>\n<p>In genetics, it was recognized way back in <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-5-80\">2004<\/a> that Excel was likely to convert about 30 human gene and protein names to dates. These names were things like <em>MARCH1<\/em>, <em>SEPT1<\/em>, <em>Oct-4<\/em>, <em>jun<\/em>, and so on.<\/p>\n<p>Several years ago, we spotted this error in supplementary data files attached to a high impact journal article and became interested in how widespread these errors are. Our 2016 article indicated that the problem affected middle and high ranking journals at roughly equal rates. This suggested to us that researchers and journals were largely unaware of the autocorrect problem and how to avoid it.<\/p>\n<p>As a result of our 2016 report, the Human Gene Name Consortium, the official body responsible for naming human genes, renamed the most problematic genes. <em>MARCH1<\/em> and <em>SEPT1<\/em> were changed to <em>MARCHF1<\/em> and <em>SEPTIN1<\/em> respectively, and others had similar changes.<\/p>\n<figure class=\"align-center \"><img decoding=\"async\" src=\"https:\/\/images.theconversation.com\/files\/417723\/original\/file-20210825-21390-6jkwc9.png?ixlib=rb-1.1.0&amp;q=45&amp;auto=format&amp;w=754&amp;fit=clip\" alt=\"Example list of gene names in Excel\"\/><figcaption><span class=\"caption\">An example list of gene names in Excel.<\/span><\/figcaption><\/figure>\n<h2><span class=\"ez-toc-section\" id=\"An_ongoing_problem\"><\/span>An ongoing problem<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Earlier this year we repeated our analysis. This time we expanded it to cover a wider selection of open access journals, anticipating researchers and journals would be taking steps to prevent such errors appearing in their supplementary data files.<\/p>\n<p>We were shocked to find in the period 2014 to 2020 that 3,436 articles, around 31% of our sample, contained <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/journals.plos.org\/ploscompbiol\/article?id=10.1371\/journal.pcbi.1008984\">gene name errors<\/a>. It seems the problem has not gone away and is actually getting worse.<\/p>\n<p><iframe loading=\"lazy\" src=\"https:\/\/datawrapper.dwcdn.net\/ThKv8\/2\/\" width=\"100%\" height=\"400px\" frameborder=\"0\"><\/iframe><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Small_errors_matter\"><\/span>Small errors matter<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Some argue these errors don\u2019t really matter, because 30 or so genes are only a small fraction of the roughly 44,000 in the entire human genome, and the errors are unlikely to overturn to conclusions of any particular genomic study.<\/p>\n<p>Anyone reusing these supplementary data files will find this small set of genes missing or corrupted. This might be irritating if your research project examines the <em>SEPT<\/em> gene family, but it\u2019s just one of many gene families in existence.<\/p>\n<p>We believe the errors matter because they raise questions about how these errors can sneak into scientific publications. If gene name autocorrect errors can pass peer-review undetected into published data files, what other errors might also be lurking among the thousands of data points?<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Spreadsheet_catastrophes\"><\/span>Spreadsheet catastrophes<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>In business and finance, there are many examples where spreadsheet errors led to <a rel=\"nofollow noopener\" target=\"_blank\" href=\"http:\/\/www.eusprig.org\/horror-stories.htm\">costly and embarrassing losses<\/a>.<\/p>\n<p>In 2012, JP Morgan declared a loss of more than US$6 billion thanks to a <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/watch-movies-tv-seriess\/\" data-internallinksmanager029f6b8e52c=\"8\" title=\"Watch Movies &amp; TV Series\" target=\"_blank\" rel=\"noopener\">series<\/a> of trading blunders made possible by <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/qz.com\/119578\/damn-you-excel-spreadsheets-jp-morgan-edition\/\">formula errors<\/a> in its modeling spreadsheets. Analysis of thousands of spreadsheets at Enron Corporation, from before its spectacular downfall in 2001, show <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/ieeexplore.ieee.org\/document\/7202944\">almost a quarter contained errors<\/a>.<\/p>\n<p>A now-infamous article by Harvard economists Carmen Reinhart and Kenneth Rogoff was used to justify austerity cuts in the aftermath of the global financial crisis, but the analysis contained a <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/theconversation.com\/the-reinhart-rogoff-error-or-how-not-to-excel-at-economics-13646\">critical Excel error<\/a> that led to omitting five of the 20 countries in their modeling.<\/p>\n<p>Just last year, a <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.bbc.com\/news\/technology-54423988\">spreadsheet error at Public Health England<\/a> led to the loss of data corresponding to around 15,000 positive COVID-19 cases. This compromised contact tracing efforts for eight days while case numbers were rapidly growing. In the healthcare setting, <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/bmjopen.bmj.com\/content\/3\/5\/e002406.short\">clinical data entry errors<\/a> into spreadsheets can be as high as 5%, while a separate <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.igi-global.com\/article\/spreadsheet-error-types-and-their-prevalence-in-a-healthcare-context\/197349\">study of hospital administration spreadsheets<\/a> showed 11 of 12 contained critical flaws.<\/p>\n<p>In biomedical research, a mistake in preparing a sample sheet resulted in a whole set of sample labels being shifted by one position and <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.nature.com\/articles\/nm0610-618a\">completely changing the genomic analysis results<\/a>. These results were significant because they were being used to justify the drugs patients were to receive in a subsequent clinical trial. This may be an isolated case, but we don\u2019t really know how common such errors are in research because of a lack of systematic error-finding studies.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Better_tools_are_available\"><\/span>Better tools are available<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Spreadsheets are versatile and useful, but they have their limitations. Businesses have moved away from spreadsheets to specialized accounting software, and nobody in IT would use a spreadsheet to handle data when database systems such as SQL are far more robust and capable.<\/p>\n<p>However, it is still common for scientists to use Excel files to share their supplementary data online. But as <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/sciencee\/\" data-internallinksmanager029f6b8e52c=\"5\" title=\"Science\" target=\"_blank\" rel=\"noopener\">science<\/a> becomes more data-intensive and the limitations of Excel become more apparent, it may be time for researchers to give spreadsheets the boot.<\/p>\n<p>In genomics and other data-heavy sciences, scripted computer languages such as Python and R are clearly superior to spreadsheets. They offer benefits including enhanced analytical techniques, reproducibility, auditability, and better management of code versions and contributions from different individuals. They may be harder to learn initially, but the benefits to better science are worth it in the long haul.<\/p>\n<p>Excel is suited to small-scale data entry and lightweight analysis. <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/www.bbc.com\/news\/technology-37176926\">Microsoft says<\/a> Excel\u2019s default settings are designed to satisfy the needs of most users, most of the time.<\/p>\n<p>Clearly, genomic science does not represent a common use case. Any data set larger than 100 rows is just not suitable for a spreadsheet.<\/p>\n<p>Researchers in data-intensive fields (particularly in the life sciences) need better computer skills. Initiatives such as <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/software-carpentry.org\/about\/\">Software Carpentry<\/a> offer workshops to researchers, but universities should also focus more on giving undergraduates the advanced analytical skills they will need.<!-- Below is The Conversation's page counter tag. Please DO NOT REMOVE. --><img decoding=\"async\" loading=\"lazy\" style=\"border: none !important;margin: 0 !important;max-height: 1px !important;max-width: 1px !important;min-height: 1px !important;min-width: 1px !important;padding: 0 !important\" alt=\"The Conversation\" width=\"1\" height=\"1\" class=\"js-lazy\" src=\"https:\/\/counter.theconversation.com\/content\/166554\/count.gif?distributor=republish-lightbox-basic\"\/><!-- End of code. If you don't see any code above, please get new code from the Advanced tab after you click the republish button. The page counter does not collect any personal data. More info: https:\/\/theconversation.com\/republishing-guidelines --><\/p>\n<p><noscript><img decoding=\"async\" loading=\"lazy\" style=\"border: none !important;margin: 0 !important;max-height: 1px !important;max-width: 1px !important;min-height: 1px !important;min-width: 1px !important;padding: 0 !important\" src=\"https:\/\/counter.theconversation.com\/content\/166554\/count.gif?distributor=republish-lightbox-basic\" alt=\"The Conversation\" width=\"1\" height=\"1\" class=\"\" srcset=\"\"\/><\/noscript><\/p>\n<p><em>Article by <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/theconversation.com\/profiles\/mark-ziemann-1262908\">Mark Ziemann<\/a>, Lecturer in Bio<a href=\"https:\/\/buradabiliyorum.com\/en\/category\/technology\/\" data-internallinksmanager029f6b8e52c=\"4\" title=\"Technology\" target=\"_blank\" rel=\"noopener\">technology<\/a> and Bioinformatics, <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/theconversation.com\/institutions\/deakin-university-757\">Deakin University<\/a> and <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/theconversation.com\/profiles\/mandhri-abeysooriya-1264035\">Mandhri Abeysooriya<\/a>, , <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/theconversation.com\/institutions\/deakin-university-757\">Deakin University<\/a><\/em><\/p>\n<p><em>This article is republished from <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/theconversation.com\">The Conversation<\/a> under a Creative Commons license. Read the <a rel=\"nofollow noopener\" target=\"_blank\" href=\"https:\/\/theconversation.com\/excel-autocorrect-errors-still-plague-genetic-research-raising-concerns-over-scientific-rigour-166554\">original article<\/a>.<\/em><\/p>\n<\/div>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMLG0nwswvr63Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\">For forums sites go to <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/forum.buradabiliyorum.com\/\" target=\"_blank\" rel=\"noopener\">Forum.BuradaBiliyorum.Com<\/a><\/span><\/strong>\n<\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/technology\/\" target=\"_blank\" rel=\"noopener\">Technology category.<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/thenextweb.com\/news\/excel-autocorrect-errors-plaguing-gene-research-syndication\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8220;#Excel autocorrect errors are plaguing gene research&#8221; Auto-correction, or predictive text, is a common feature of many modern tech tools, from internet searches to messaging apps and word processors. Auto-correction can be a blessing, but when the algorithm makes mistakes it can change the message in dramatic and sometimes hilarious ways. Our research shows autocorrect&#8230;<\/p>\n","protected":false},"author":1,"featured_media":332369,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/img-cdn.tnwcdn.com\/image\/tnw?filter_last=1&fit=1280,640&url=https:\/\/cdn0.tnwcdn.com\/wp-content\/blogs.dir\/1\/files\/2021\/08\/Excelhed.jpg&signature=266c8f053ff78395489f6416d8611348","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[],"class_list":["post-332368","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/332368","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=332368"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/332368\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/332369"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=332368"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=332368"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=332368"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}