Which is Strongest 'AI Reporter'? First Edition of NBD's AI Model Evaluation Report Released

NBD

In the evolving landscape of media, generative AI models are revolutionizing content creation and dissemination. Faced with a plethora of options, media professionals and content creators are tasked with selecting the most suitable AI model for specific scenarios. To address this, the "NBD AI Model Evaluation Team," consisting of over 30 outstanding journalists, editors, and engineers from the Daily Economic News, conducted a two-month in-depth review of mainstream AI models in financial news scenarios, culminating in the release of the "NBD AI Model Evaluation Report" (1st Edition).

The report reveals that domestic AI models are rapidly catching up with their overseas counterparts, with Yi-Large from Lingyi Wanwu emerging as a dark horse, ranking first in "Financial News Headline Creation," "Weibo News Writing," "Article Proofreading," and "Financial Data Calculation and Analysis." DeepSeek-V2 from High-Flyer and BaiChuan4 from BaiChuan Intelligence demonstrated strong data calculation and analysis capabilities. Interestingly, the widely acclaimed GPT 4.0 underperformed, ranking last in "Financial News Headline Creation."

National Business Daily (NBD), a leading Chinese financial media outlet, has been at the forefront of integrating AI into its strategy since 2020, launching a suite of AI products that have garnered market praise. The expertise in financial journalism and AI technology provides a solid foundation for the AI model evaluation.

The evaluation team selected 15 mainstream AI models, including GPT 4.0, Baidu's Wenxin, and Moonshot AI, and assessed them across four key financial news scenarios using the "Yan Zhi Xuan AI Creation +" platform developed by NBD Technology. The results were verified, scored, and ranked by 15 senior journalists and editors.

Yi-Large stood out, with Anthropic's Claude 3 Opus and DeepSeek-V2 following closely. The models showed significant performance variances across different tasks. GPT 4.0's unexpected poor performance highlights the challenges of cross-language and cultural adaptability, while domestic models show a natural advantage in localized applications.

The report concludes that domestic models are gaining ground, with some excelling in specific scenarios, such as financial data analysis. It also points out the importance of information extraction capabilities, which vary among models and are crucial for accurate news generation.

The NBD AI Model Evaluation Team will continue to explore the potential of AI models, offering insights and discoveries through regular professional reports. They invite participation from both developers eager to showcase their models and users looking to explore AI model applications in various scenarios.

Editor: Gao Han

Which is Strongest 'AI Reporter'? First Edition of NBD's AI Model Evaluation Report Released

Most Popular