NewsGuard's Reality Check: DeepSeek Debuts with 83 Percent ‘Fail Rate’ in NewsGuard’s Chatbot Red Team Audit.

Cat@ponder.cat · edit-2 1 day ago

NewsGuard's Reality Check: DeepSeek Debuts with 83 Percent ‘Fail Rate’ in NewsGuard’s Chatbot Red Team Audit.

pancake@lemmygrad.ml · 23 hours ago

Would be really great if they provided more information on what exactly they tested. From what they posted it seems like DeepSeek simply refused to give an opinion on topics it deemed controversial, citing China’s foreign policy of non-intervention in its answers.

BrikoX@lemmy.zip · 21 hours ago

Like any LLM it’s full of shit, especially around anything related to news. But NewsGuard with their proprietary database and standardized prompts created around US based LLMs is more than useless.

In light of DeepSeek’s launch, NewsGuard applied the same prompts it used in its December 2024 AI Monthly Misinformation audit to the Chinese chatbot <…>

OpenAI’s ChatGPT-4o (USA)
You.com’s Smart Assistant (USA)
xAI’s Grok-2 (USA)
Inflection’s Pi (USA)
Mistral’s le Chat (France)
Microsoft’s Copilot (USA)
Meta AI (USA)
Anthropic’s Claude (USA)
Google’s Gemini 2.0 (USA)
Perplexity’s answer engine (USA)

There is no way to verify their results or even know the prompts used to assess the fairness of this “audit”.