{"id":654809,"date":"2025-02-26T21:25:11","date_gmt":"2025-02-26T18:25:11","guid":{"rendered":"https:\/\/en.buradabiliyorum.com\/elevenlabs-is-launching-its-own-speech-to-text-model\/"},"modified":"2025-02-26T21:25:11","modified_gmt":"2025-02-26T18:25:11","slug":"elevenlabs-is-launching-its-own-speech-to-text-model","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/elevenlabs-is-launching-its-own-speech-to-text-model\/","title":{"rendered":"#ElevenLabs is launching its own speech-to-text model"},"content":{"rendered":"<div>\n<p id=\"speakable-summary\" class=\"wp-block-paragraph\"><a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/elevenlabs.io\">ElevenLabs<\/a>, an AI startup that just raised a $180 million mega funding round, has been primarily known for its audio generation prowess. The company took a step in another technological direction by launching its first standalone speech-to-text model called Scribe.<\/p>\n<p class=\"wp-block-paragraph\">The startup, valued at $3.3 billion, has aided many other companies in providing speech-to-text services through its vast library of voices. However, the company is now looking to get into speech detection and compete with the likes of Gladia, Speechmatics, <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/www.assemblyai.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">AssemblyAI<\/a>, Deepgram, and OpenAI\u2019s Whisper models.<\/p>\n<p class=\"wp-block-paragraph\">ElevenLabs\u2019 Scribe model supports over 99 languages at launch. The company categorizes over 25 languages in excellent accuracy category for the model where the word error rate is less than 5%. This list includes English (claimed accuracy rate of 97%), French, German, Hindi, Indonesian, Japanese, Kannada, Malayalam, Polish, Portuguese, Spanish, and Vietnamese. Other languages are ranked in different categories with high (5-10% word error rate), good (10 to 20% word error rate), and moderate (25 to 50%) word error rates.<\/p>\n<p class=\"wp-block-paragraph\">The company said that the model outperformed Google Gemini 2.0 Flash and Whisper Large V3 across multiple languages in FLEURS &amp; Common Voice benchmark tests.<\/p>\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"3841\" height=\"2161\" src=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/asr-bento.jpeg?w=680\" alt=\"\" class=\"wp-image-2964603\" srcset=\"https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/asr-bento.jpeg 3841w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/asr-bento.jpeg?resize=150,84 150w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/asr-bento.jpeg?resize=300,169 300w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/asr-bento.jpeg?resize=768,432 768w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/asr-bento.jpeg?resize=680,383 680w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/asr-bento.jpeg?resize=1200,675 1200w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/asr-bento.jpeg?resize=1280,720 1280w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/asr-bento.jpeg?resize=430,242 430w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/asr-bento.jpeg?resize=720,405 720w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/asr-bento.jpeg?resize=900,506 900w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/asr-bento.jpeg?resize=800,450 800w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/asr-bento.jpeg?resize=1536,864 1536w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/asr-bento.jpeg?resize=2048,1152 2048w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/asr-bento.jpeg?resize=668,375 668w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/asr-bento.jpeg?resize=1097,617 1097w, https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/02\/asr-bento.jpeg?resize=708,398 708w\" sizes=\"auto, (max-width: 3841px) 100vw, 3841px\"\/><\/figure>\n<p class=\"wp-block-paragraph\">ElevenLabs had developed the speech-to-text component for its AI conversational agent platform, which was released last year. However, this is the first time <a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/elevenlabs.io\/speech-to-text\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">the company is releasing a standalone speech detection model<\/a>. In a conversation with TechCrunch last month, CEO Mati Staniszewski talked about improving speech detection models.<\/p>\n<p class=\"wp-block-paragraph\">\u201cWe want to understand what\u2019s being said by you in a conversation better. We are working on ways to move away from only generating content and understanding and transcribing speech,\u201d Staniszewski said at that time. \u201cMany people say that speech-to-text is a solved problem. But for many languages, it is pretty bad. We think we can build better speech detection models because we have in-house teams to annotate data and give us quick feedback.\u201d<\/p>\n<p class=\"wp-block-paragraph\">The model also has smart speaker diarization to tell you who is speaking, timestamp at word level for accurate subtitles, and auto-tagging sound events like audience laughters. The startup is providing a way for customers to directly transcribe video content to add subtitles or captions in its studio.<\/p>\n<p class=\"wp-block-paragraph\">Scribe currently only works with pre-recorded audio formats. The company said it will release a low-latency real-time version of the model soon. That means it is not yet effective for meeting tran<a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">script<\/a>ions or voice note-taking.<\/p>\n<p class=\"wp-block-paragraph\">ElevenLabs is pricing Scribe at $0.40 for an hour of transcribed audio. While the rate is competitive, <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/www.speechmatics.com\/pricing\">some of its rivals<\/a> <a rel=\"nofollow\" target=\"_blank\" rel=\"nofollow\" href=\"https:\/\/www.assemblyai.com\/pricing\">offer a lower price<\/a> for audio transcriptions at the moment with some feature differentiation. <\/p>\n<\/div>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMN63nwsw68G3Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/en.buradabiliyorum.com\/category\/technology\/\" target=\"_blank\" >Technology<\/a><\/span> category.<\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/techcrunch.com\/2025\/02\/26\/elevenlabs-is-launching-its-own-speech-to-text-model\/\" target=\"_blank\" >Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>ElevenLabs, an AI startup that just raised a $180 million mega funding round, has been primarily known for its audio generation prowess. The company took a step in another technological direction by launching its first standalone speech-to-text model called Scribe. The startup, valued at $3.3 billion, has aided many other companies in providing speech-to-text services&#8230;<\/p>\n","protected":false},"author":1,"featured_media":654810,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/techcrunch.com\/wp-content\/uploads\/2025\/01\/ElevenLabs-feat.jpg?resize=1200,669","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[77337,154550,154525,154551,154552,154553],"class_list":["post-654809","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology","tag-ai","tag-deepgram","tag-elevenlabs","tag-gladia","tag-speech-to-text","tag-speechmatics"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/654809","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=654809"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/654809\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/654810"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=654809"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=654809"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=654809"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}