Member-only story
DeepSeek R1 vs. OpenAI O1 vs. Claude 3.5: Who Reigns Supreme in AI Coding?
A recent performance comparison between three leading AI models — DeepSeek-R1, OpenAI O1, and Claude 3.5 Sonnet — sheds light on their strengths and weaknesses, particularly in coding tasks. The challenge, a Python programming task on the Exercism platform, tested the models’ abilities to implement a Rest API, requiring them to handle complex logic, JSON data, string processing, and balance calculations.
Aider Coding Standard Rankings
In the Aider coding standard rankings, the three models are positioned as follows:
OpenAI O1: Takes the top spot with impressive speed and initial accuracy.
DeepSeek R1: Secures second place, with a notable improvement in performance, moving from 45% to 52%.
Claude 3.5 Sonnet: Falls behind DeepSeek R1, despite its potential for improvement with error feedback.
DeepSeek 3: Positioned below Sonnet in terms of overall performance.
The Challenge: Rest API Exercise
The Rest API Python challenge required models to:
Implement IOU API endpoints.
Understand API design principles.
Process JSON data…