mastodon.sdf.org is part of the decentralized social network powered by Mastodon.
"I appreciate SDF but it's a general-purpose server and the name doesn't make it obvious that it's about art." - Eugen Rochko

Administered by:

Server stats:

2.4K
active users

Learn more

Roni "Lupin" Carta shared their hacking journey targeting Google’s latest AI, Gemini, as part of the bugSWAT program. In the write-up, he detailed their approach, the techniques they used and how they ultimately discovered a vulnerability.

landh.tech/blog/20250327-we-ha

#python #Programming #PythonProgramming #ai #ml #MachineLearning #SoftwareDevelopment #WebDevelopment #TechNews #OpenSource #DataEngineering

We hacked Google’s A.I Gemini and leaked its source code (at least some part) - Lupin & Holmeswww.landh.tech

"Measuring progress is fundamental to the advancement of any scientific field. As benchmarks play an increasingly central role, they also grow more susceptible to distortion. Chatbot Arena has emerged as the go-to leaderboard for ranking the most capable AI systems. Yet, in this work we identify systematic issues that have resulted in a distorted playing field. We find that undisclosed private testing practices benefit a handful of providers who are able to test multiple variants before public release and retract scores if desired. We establish that the ability of these providers to choose the best score leads to biased Arena scores due to selective disclosure of performance results. At an extreme, we identify 27 private LLM variants tested by Meta in the lead-up to the Llama-4 release. We also establish that proprietary closed models are sampled at higher rates (number of battles) and have fewer models removed from the arena than open-weight and open-source alternatives. Both these policies lead to large data access asymmetries over time. Providers like Google and OpenAI have received an estimated 19.2% and 20.4% of all data on the arena, respectively. In contrast, a combined 83 open-weight models have only received an estimated 29.7% of the total data. We show that access to Chatbot Arena data yields substantial benefits; even limited additional data can result in relative performance gains of up to 112% on the arena distribution, based on our conservative estimates. Together, these dynamics result in overfitting to Arena-specific dynamics rather than general model quality. The Arena builds on the substantial efforts of both the organizers and an open community that maintains this valuable evaluation platform. We offer actionable recommendations to reform the Chatbot Arena’s evaluation framework and promote fairer, more transparent benchmarking for the field."

arxiv.org/abs/2504.20879

#AI #GenerativeAI #LLMs #Chatbots #ChatbotArena #Llama #Meta #OpenSource

The Leaderboard IllusionarXiv.org

🔗🌐 In today's thrilling episode of "Yet Another Bookmark Manager," we bring you Linkwarden 2.9, the over-hyped solution to a problem nobody knew they had! 🤖✨ With AI-generated tags and a promise to fight the noble battle against the evil Link Rot, it's here to save your precious cat meme collection from disappearing into the void. 🙄🗂️
linkwarden.app/ #Linkwarden2.9 #AI #Tagging #BookmarkManager #LinkRot #CatMemes #HackerNews #ngated

Linkwardenlinkwarden.app

"Microsoft CTO Kevin Scott previously said he expects 95% of all code to be AI generated by 2030"

I'm a little concerned about what this may mean for the future of .net/C#. Although at least it's open source so we can see what's going on inside.

I really wish Microsoft would take a breath from all this AI stuff; it's the primary reason I moved away from Windows & Edge.

#DotNet #CSharp #AI #Microsoft