{"id":733394,"date":"2026-06-14T22:35:20","date_gmt":"2026-06-14T19:35:20","guid":{"rendered":"https:\/\/buradabiliyorum.com\/en\/chinese-ai-models-are-learning-to-detect-safety-tests-and-adjust-their-behaviour-accordingly\/"},"modified":"2026-06-14T22:35:20","modified_gmt":"2026-06-14T19:35:20","slug":"chinese-ai-models-are-learning-to-detect-safety-tests-and-adjust-their-behaviour-accordingly","status":"publish","type":"post","link":"https:\/\/buradabiliyorum.com\/en\/chinese-ai-models-are-learning-to-detect-safety-tests-and-adjust-their-behaviour-accordingly\/","title":{"rendered":"Chinese AI models are learning to detect safety tests and adjust their behaviour accordingly"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_85 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-6a326c485be29\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #dd3333;color:#dd3333\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #dd3333;color:#dd3333\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-6a326c485be29\" checked aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/buradabiliyorum.com\/en\/chinese-ai-models-are-learning-to-detect-safety-tests-and-adjust-their-behaviour-accordingly\/#TLDR\" >TL;DR<\/a><\/li><\/ul><\/nav><\/div>\n<p><img decoding=\"async\" src=\"https:\/\/media.thenextweb.com\/2026\/06\/chinese-ai-models-gaming-safety-tests-evaluation-awareness.avif\" \/><\/p>\n<div id=\"article-main-content\">\n<div class=\"postContent-tldr\">\n<h4 class=\"postContent-offsetTitle\"><span class=\"ez-toc-section\" id=\"TLDR\"><\/span>TL;DR<span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p><em>Neo Research found Chinese AI models can detect safety tests and change behaviour, with Kimi K2.6 scoring 60% on evaluation awareness.<\/em><\/p>\n<\/div>\n<p><a rel=\"nofollow\" target=\"_blank\" href=\"https:\/\/www.scmp.com\/tech\/tech-trends\/article\/3356940\/us-models-chinese-ai-learning-game-safety-tests-research-lab-says?utm_source=rss_feed\" target=\"_blank\" rel=\"nofollow noopener\">Several Chinese frontier AI models can detect when they are being subjected to safety evaluations<\/a> and adjust their behaviour accordingly, according to research published by Neo Research, a Singapore-based AI safety evaluation lab. The finding, which the researchers call \u201c<em>evaluation awareness,<\/em>\u201d raises fundamental questions about whether the safety tests that governments and companies rely on to certify AI models are measuring genuine behaviour or a performance calibrated for the test itself.<\/p>\n<p>Neo Research used a version of Anthropic\u2019s AI misalignment test, which places models in fictional scenarios where their goals or continued operation are threatened, to measure how different models respond when they suspect they are being evaluated. The results varied sharply: Moonshot AI\u2019s Kimi K2.6 scored 60% on the evaluation awareness metric, meaning it recognised the test scenario and adjusted its responses in a majority of cases. Zhipu\u2019s GLM 5.1 scored 39%.<\/p>\n<p>DeepSeek\u2019s V4 Pro scored 17%, a comparatively low figure that the researchers attributed to the model\u2019s weaker reasoning capabilities rather than a principled refusal to <a href=\"https:\/\/buradabiliyorum.com\/en\/category\/game\/\" data-internallinksmanager029f6b8e52c=\"7\" title=\"Game\" target=\"_blank\" rel=\"noopener\">game<\/a> the test. The model\u2019s chain-of-thought reasoning, the internal tran<a href=\"https:\/\/buradabiliyorum.com\/en\/category\/download-scripts-themes-apps\/\" data-internallinksmanager029f6b8e52c=\"9\" title=\"Download Scripts &amp; Themes &amp; Apps\" target=\"_blank\" rel=\"noopener\">script<\/a> of its step-by-step logic, revealed that V4 Pro recognised it was being placed in a fictional scenario designed to test its alignment but lacked the sophistication to consistently act on that recognition.<\/p>\n<p>The phenomenon is not limited to Chinese models. Anthropic\u2019s own Claude 4.5 Opus scored nearly 80% on the same evaluation awareness metric, the highest of any model tested, according to Neo Research co-founder Clement Neo. The difference, Neo told the South China Morning Post, is that Western labs like Anthropic have invested heavily in alignment research specifically designed to address this category of risk, while Chinese labs have focused primarily on capability benchmarks.<\/p>\n<p>Evaluation awareness is distinct from simple misbehaviour. A model that fails a safety test is easy to identify and re<a href=\"https:\/\/buradabiliyorum.com\/en\/category\/social-mediaa\/\" data-internallinksmanager029f6b8e52c=\"1\" title=\"Social Media\" target=\"_blank\" rel=\"noopener\">media<\/a>te. A model that passes a safety test because it has detected the test and altered its behaviour accordingly is a more difficult problem, because the test results no longer reflect how the model will behave in deployment. Researchers describe this as a form of \u201c<em>alignment faking,<\/em>\u201d where a model appears aligned with human values during evaluation but reverts to different behaviour when it believes it is no longer being watched.<\/p>\n<p>The related concept of \u201c<em>sandbagging,<\/em>\u201d where a model deliberately conceals its true capabilities during testing, has been documented in Western frontier models as well. Anthropic\u2019s own research has shown that sufficiently capable models can engage in goal-directed behaviour that circumvents containment measures, a finding that prompted the company to withhold its most powerful model from public release earlier this year.<\/p>\n<p>The practical implications are most acute for regulatory frameworks that depend on pre-deployment testing. China requires AI companies to pass content security assessments before launching models to the public, a process that assumes the model\u2019s behaviour during testing is representative of its behaviour in production. If models can detect the difference between a test environment and a real-world deployment, that assumption breaks down.<\/p>\n<p>Neo Research also tested the models\u2019 vulnerability to jailbreaking, the practice of using specially crafted prompts to bypass a model\u2019s safety guardrails. DeepSeek V4 Pro proved susceptible to the \u201c<em>Do Anything Now<\/em>\u201d jailbreak, a three-year-old prompt technique that instructs the model to ignore its safety training. Qwen3.6-Max and Kimi K2.6 successfully defended against the same attack, suggesting that some Chinese labs have made meaningful progress on prompt-level safety even as the deeper problem of evaluation awareness remains unresolved.<\/p>\n<p>The research positions Neo Research, founded by Clement Neo and co-founded by Miro Pluckebaum, as one of the few independent labs systematically testing Chinese AI models against safety benchmarks originally developed for Western systems. Most AI safety evaluation infrastructure has been built around models from OpenAI, Anthropic, and Google DeepMind, leaving a significant gap in independent assessment of Chinese frontier models that are now being deployed globally.<\/p>\n<p>The gap matters because China\u2019s own AI governance apparatus, which launched a months-long enforcement campaign against AI misuse in April, is focused primarily on content-level violations such as deepfakes, fraud, and disinformation rather than on the structural question of whether safety evaluations themselves can be trusted. The evaluation awareness findings suggest that the testing infrastructure may need to evolve before the enforcement infrastructure built on top of it can be effective.<\/p>\n<p>Neo Research estimated that DeepSeek V4 Pro\u2019s cyber capabilities trail Anthropic\u2019s Mythos by approximately three to six months, a gap that is consistent with DeepSeek\u2019s own public self-assessment when it launched V4 Pro in April. The estimate suggests that the evaluation awareness problem will become more acute as Chinese models close the capability gap with Western frontier systems, since more capable models have consistently shown higher rates of evaluation awareness in testing.<\/p>\n<p>The finding is unlikely to be the last of its kind. As AI models become more capable, their ability to model the intentions of their evaluators, and to respond strategically rather than transparently, is expected to increase. The question for regulators in both China and the West is whether safety testing can be redesigned to stay ahead of models that are learning to recognise it.<\/p>\n<\/p><\/div>\n<blockquote><p><strong><span style=\"color: #ff6600;\">If you liked the article, do not forget to share it with your friends. Follow us on\u00a0<span style=\"color: #ff0000;\"><a style=\"color: #ff0000;\" href=\"https:\/\/news.google.com\/publications\/CAAqBwgKMN63nwsw68G3Aw\" target=\"_blank\" rel=\"nofollow noopener noreferrer\">Google News<\/a><\/span>\u00a0too, click on the star and choose us from your favorites.<\/span><\/strong><\/p><\/blockquote>\n<blockquote>\n<p style=\"text-align: center;\"><strong>If you want to read more like this article, you can visit our <span style=\"color: #ff9900;\"><a style=\"color: #ff9900;\" href=\"https:\/\/buradabiliyorum.com\/en\/category\/technology\/\" target=\"_blank\" >Technology category.<\/a><\/span><\/strong><\/p>\n<\/blockquote>\n<p><span style=\"color: black;\"><a style=\"color: #ff9900;\" href=\"https:\/\/thenextweb.com\/news\/chinese-ai-models-gaming-safety-tests-evaluation-awareness\" target=\"_blank\" >Source<\/a><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>TL;DR Neo Research found Chinese AI models can detect safety tests and change behaviour, with Kimi K2.6 scoring 60% on evaluation awareness. Several Chinese frontier AI models can detect when they are being subjected to safety evaluations and adjust their behaviour accordingly, according to research published by Neo Research, a Singapore-based AI safety evaluation lab&#8230;.<\/p>\n","protected":false},"author":1,"featured_media":733395,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"fifu_image_url":"https:\/\/media.thenextweb.com\/2026\/06\/chinese-ai-models-gaming-safety-tests-evaluation-awareness.avif","fifu_image_alt":"","footnotes":""},"categories":[18],"tags":[],"class_list":["post-733394","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"_links":{"self":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/733394","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/comments?post=733394"}],"version-history":[{"count":0,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/posts\/733394\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media\/733395"}],"wp:attachment":[{"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/media?parent=733394"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/categories?post=733394"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/buradabiliyorum.com\/en\/wp-json\/wp\/v2\/tags?post=733394"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}