#Even some of the best AI can’t beat this new benchmark

#Even some of the best AI can’t beat this new benchmark

The nonprofit Center for AI Safety (CAIS) and Scale AI, a company that provides a number of data labeling and AI development services, have released a challenging new benchmark for frontier AI systems. The benchmark, called Humanity’s Last Exam, includes thousands of crowdsourced questions touching on subjects like mathematics, humanities, and the natural sciences. To make…

Read More
#AI benchmarking organization criticized for waiting to disclose funding from OpenAI

#AI benchmarking organization criticized for waiting to disclose funding from OpenAI

An organization developing math benchmarks for AI didn’t disclose that it had received funding from OpenAI until relatively recently, drawing allegations of impropriety from some in the AI community. Epoch AI, a nonprofit primarily funded by Open Philanthropy, a research and grantmaking foundation, revealed on December 20 that OpenAI had supported the creation of FrontierMath….

Read More
#Chinese AI company MiniMax releases new models it claims are competitive with the industry’s best

#Chinese AI company MiniMax releases new models it claims are competitive with the industry’s best

Chinese firms continue to release AI models that rival the capabilities of systems developed by OpenAI and other U.S.-based AI companies. This week, MiniMax, an Alibaba- and Tencent-backed startup that has raised around $850 million in venture capital and is valued at more than $2.5 billion, debuted three new models: MiniMax-Text-01, MiniMax-VL-01, and T2A-01-HD. MiniMax-Text-01…

Read More