OpenAI GPT 4o ranked as best AI model for writing Solidity smart contract code by IQ

Receive, Manage & Grow Your Crypto Investments With Brighty

SolidityBench by IQ has launched as the primary leaderboard to judge LLMs in Solidity code technology. Out there on Hugging Face, it introduces two revolutionary benchmarks, NaïveJudge and HumanEval for Solidity, designed to evaluate and rank the proficiency of AI fashions in producing good contract code.

Developed by IQ’s BrainDAO as a part of its forthcoming IQ Code suite, SolidityBench serves to refine their very own EVMind LLMs and evaluate them in opposition to generalist and community-created fashions. IQ Code goals to supply AI fashions tailor-made for producing and auditing good contract code, addressing the rising want for safe and environment friendly blockchain functions.

As IQ advised CryptoSlate, NaïveJudge gives a novel method by tasking LLMs with implementing good contracts based mostly on detailed specs derived from audited OpenZeppelin contracts. These contracts present a gold commonplace for correctness and effectivity. The generated code is evaluated in opposition to a reference implementation utilizing standards equivalent to useful completeness, adherence to Solidity greatest practices and safety requirements, and optimization effectivity.

The analysis course of leverages advanced LLMs, together with totally different variations of OpenAI’s GPT-4 and Claude 3.5 Sonnet as neutral code reviewers. They assess the code based mostly on rigorous standards, together with implementing all key functionalities, dealing with edge instances, error administration, correct syntax utilization, and total code construction and maintainability.

Optimization issues equivalent to fuel effectivity and storage administration are additionally evaluated. Scores vary from 0 to 100, offering a complete evaluation throughout performance, safety, and effectivity, mirroring the complexities {of professional} good contract growth.

Which AI fashions are greatest for solidity good contract growth?

Benchmarking outcomes confirmed that OpenAI’s GPT-4o mannequin achieved the very best total rating of 80.05, with a NaïveJudge rating of 72.18 and HumanEval for Solidity move charges of 80% at move@1 and 92% at move@3.

Apparently, newer reasoning fashions like OpenAI’s o1-preview and o1-mini had been crushed to the highest spot, scoring 77.61 and 75.08, respectively. Fashions from Anthropic and XAI, together with Claude 3.5 Sonnet and grok-2, demonstrated aggressive efficiency with total scores hovering round 74. Nvidia’s Llama-3.1-Nemotron-70B scored lowest within the prime 10 at 52.54.

SolidityBench scores for LLMs (Hugging Face)

Per IQ, HumanEval for Solidity adapts OpenAI’s unique HumanEval benchmark from Python to Solidity, encompassing 25 duties of various problem. Every job consists of corresponding exams suitable with Hardhat, a preferred Ethereum growth surroundings, facilitating correct compilation and testing of generated code. The analysis metrics, move@1 and move@3, measure the mannequin’s success on preliminary makes an attempt and over a number of tries, providing insights into each precision and problem-solving capabilities.

Objectives of using AI fashions in good contract growth

By introducing these benchmarks, SolidityBench seeks to advance AI-assisted good contract growth. It encourages the creation of extra refined and dependable AI fashions whereas offering builders and researchers with helpful insights into AI’s present capabilities and limitations in Solidity growth.

The benchmarking toolkit goals to advance IQ Code’s EVMind LLMs and likewise units new requirements for AI-assisted good contract growth throughout the blockchain ecosystem. The initiative hopes to deal with a important want within the business, the place the demand for secure and environment friendly good contracts continues to develop.

Builders, researchers, and AI fans are invited to discover and contribute to SolidityBench, which goals to drive the continual refinement of AI fashions, promote greatest practices, and advance decentralized functions.

Go to the SolidityBench leaderboard on Hugging Face to study extra and start benchmarking Solidity technology fashions.

Talked about on this article

Source link

Post Views: 38

#Code #contract #GPT #Model #OpenAI #Ranked #Smart #Solidity #writing

Web3

Model Context Protocol (MCP): Why it is a Breakthrough for AI Integration

March 31, 2025

Web3

Tomorro Lands €25M to Supercharge AI Contract Innovation and Expand in Europe

March 31, 2025

Gaming Global

The live-action Disney Princess dresses, ranked

March 29, 2025

More from Web3

Eggmed Launches Next-Gen EHR Focused on Continuous Care Between Sessions

Posted On April 1, 2025

Web3Wire 0

Picture: https://lh7-rt.googleusercontent.com/docsz/AD_4nXd4fmsJ946b3m8KK5a6FgLMcmovDGDaWFFW-UFiz6KAx0wACn9o9FYWrBtDXgCb0FYepLJ1dlnGZxjw5EmwF1HuTmD38s6_4jwka0QpFfyiEftTfsQmg4vLj19yA-GVJEcxVou6ug?key=jK9hncucen9R_biM2d1UtRfqNew York, NY – 1 April, 2025 – Eggmed [https://www.eggmed.com/], a digital well being startup, has launched …

Nintendo Switch 2 Preview: Everything You Need to Know

Posted On April 1, 2025

Andrew Hayward 0

The Swap is Nintendo’s best-selling residence console of all time, and broadly beloved for its wealthy library of video …

The Future of Brain Computer Interfaces in Medicine Market: Forecasting the Next Decade

Posted On April 1, 2025

Web3Wire 0

Mind Laptop Interfaces (BCI) in drugs market Mind Laptop Interfaces in Medication Market dimension is estimated to be USD …

Categories

Popular Posts

Newsletter

Search

Editors

OpenAI GPT 4o ranked as best AI model for writing Solidity smart contract code by IQ

Which AI fashions are greatest for solidity good contract growth?

Objectives of using AI fashions in good contract growth

🤖 Prime AI Crypto Property

Talked about on this article

You might also like

More from Web3

Eggmed Launches Next-Gen EHR Focused on Continuous Care Between Sessions

Nintendo Switch 2 Preview: Everything You Need to Know

The Future of Brain Computer Interfaces in Medicine Market: Forecasting the Next Decade

Leave a Reply Cancel reply

Recent Posts

Share