GPT-5.3-Codex is a new, more capable agentic coding model that surpasses its predecessor in coding performance, reasoning, and professional knowledge, while also being faster. Notably, it was instrumental in its own creation, accelerating its development through debugging, deployment management, and test result analysis, showcasing its ability to perform a wide range of tasks previously done by developers and professionals.
Weâre introducing a new model that unlocks even more of what Codex can do: GPTâ5.3-Codex, the most capable agentic coding model to date. The model advances both the frontier coding performance of GPTâ5.2-Codex and the reasoning and professional knowledge capabilities of GPTâ5.2, together in one model, which is also 25% faster. This enables it to take on long-running tasks that involve research, tool use, and complex execution. Much like a colleague, you can steer and interact with GPTâ5.3-Codex while itâs working, without losing context.
GPTâ5.3âCodex is our first model that was instrumental in creating itself. The Codex team used early versions to debug its own training, manage its own deployment, and diagnose test results and evaluationsâour team was blown away by how much Codex was able to accelerate its own development.
With GPTâ5.3-Codex, Codex goes from an agent that can write and review code to an agent that can do nearly anything developers and professionals can do on a computer.
GPTâ5.3-Codex sets a new industry high on SWE-Bench Pro and Terminal-Bench, and shows strong performance on OSWorld and GDPval, four benchmarks we use to measure coding, agentic and real-world capabilities.
GPTâ5.3-Codex achieves state-of-the-art performance on SWE-Bench Pro, a rigorous evaluation of real-world software engineering. Where SWEâbench Verified only tests Python, SWEâBench Pro spans four languages and is more contaminationâresistant, challenging, diverse and industry-relevant. It also far exceeds the previous state-of-the-art performance on Terminal-Bench 2.0, which measures the terminal skills a coding agent like Codex needs. Notably, GPTâ5.3âCodex does so with fewer tokens than any prior model, letting users build more.
Combining frontier coding capabilities, improvements in aesthetics, and compaction results in a model that can do striking work, building highly functional complex games and apps from scratch over the course of days. To test the modelâs web development and long-running agentic capabilities, we asked GPTâ5.3âCodex to build us two games: version two of the racing game from the Codex app launchâ , and a diving game. Using the develop web game skill and preselected, generic follow-up prompts like "fix the bug" or "improve the game", GPTâ5.3-Codex iterated on the games autonomously over millions of tokens. Watch the trailers and play the games for yourself to see what Codex can do.
GPTâ5.3-Codex also better understands your intent when you ask it to make day-to-day websites, compared to GPTâ5.2-Codex. Simple or underspecified prompts now default to sites with more functionality and sensible defaults, giving you a stronger starting canvas to bring your ideas to life.
For example, we asked GPTâ5.3-Codex and GPTâ5.2-Codex to build two landing pages below. GPTâ5.3-Codex automatically showed the yearly plan as a discounted monthly price, making the discount feel clear and intentional, instead of multiplying the yearly total. It also made an automatically transitioning testimonial carousel with three distinct user quotes rather than one, resulting in a page that feels more complete and production-ready by default.
Software engineers, designers, product managers, and data scientists do far more than generate code. GPTâ5.3âCodex is built to support all of the work in the software lifecycleâdebugging, deploying, monitoring, writing PRDs, editing copy, user research, tests, metrics, and more. Its agentic capabilities go beyond software, helping you build whatever you want to buildâwhether itâs slide decks or analyzing data in sheets.
With custom skills similar to those used for our previous GDPval results, GPTâ5.3âCodex also shows strong performance on professional knowledge work as measured by GDPâ valâ , matching GPTâ5.2. GDPval is an evaluation OpenAI released in 2025 that measures a modelâs performance on wellâspecified knowledgeâwork tasks across 44 occupations. These tasks include things like making presentations, spreadsheets, and other work products.
Below are a few examples of the work the agent produced.
OSWorld is an agentic computer-use benchmark where the agent has to complete productivity tasks in a visual desktop computer environment. GPTâ5.3-Codex demonstrates far stronger computer use capabilities than previous GPT models.
Together, these results across coding, frontend, and computer-use and real-world tasks show that GPTâ5.3-Codex isnât just better at individual tasks, but marks a step change toward a single, general-purpose agent that can reason, build, and execute across the full spectrum of real-world technical work.
As model capabilities become more powerful, the gap shifts from what agents are capable of doing to how easily humans can interact with, direct and supervise many of them working in parallel. The Codex app makes managing and directing agents much easier, and now with GPTâ5.3-Codex itâs more interactive. With the new model, Codex provides frequent updates so you stay appraised of key decisions and progress as it works. Instead of waiting for a final output, you can interact in real timeâask questions, discuss approaches, and steer toward the solution. GPTâ5.3-Codex talks through what itâs doing, responds to feedback, and keeps you in the loop from start to finish.
Enable steering while the model works in the app in Settings > General > Follow-up behavior.
The recent rapid Codex improvements build on the fruit of research projects spanning months or years across all of OpenAI. These research projects are being accelerated by Codex, with many researchers and engineers at OpenAI describing their job today as being fundamentally different from what it was just two months ago. Even early versions of GPTâ5.3-Codex demonstrated exceptional capabilities, allowing our team to work with those earlier versions to improve training and support the deployment of later versions.
Codex is useful for a very broad range of tasks, making it difficult to fully enumerate the ways in which it helps our teams. As some examples, the research team used Codex to monitor and debug the training run for this release. It accelerated research beyond debugging infrastructure problems: it helped track patterns throughout the course of training, provided a deep analysis on interaction quality, proposed fixes and built rich applications for human researchers to precisely understand how the modelâs behavior differed compared to prior models.
The engineering team used Codex to optimize and adapt the harness for GPTâ5.3-Codex. When we started seeing strange edge cases impacting users, team members used Codex to identify context rendering bugs, and root cause low cache hit rates. GPTâ5.3-Codex is continuing to help the team throughout the launch by dynamically scaling GPU clusters to adjust to traffic surges and keeping latency stable.
During alpha testing, one researcher wanted to understand how much additional work GPTâ5.3-Codex was getting done per turn and the associated difference in productivity. GPTâ5.3-Codex came up with several simple regex classifiers to estimate frequency of clarifications, positive and negative user responses, progress on the task, and then ran them scalably over all session logs and produced a report with its conclusion. People building with Codex were happier as the agent was better understanding their intent and made more progress per turn, with fewer clarifying questions.
Due to GPTâ5.3-Codex being so different from its predecessors, the data from alpha testing exhibited numerous unusual and counter-intuitive results. A data scientist on the team worked with GPTâ5.3-Codex to build new data pipelines and visualize the results much more richly than our standard dashboarding tools enabled. The results were co-analyzed with Codex, which concisely summarized key insights over thousands of data points in under three minutes.
Individually, all of these tasks are interesting examples of how Codex can help researchers and product builders. Taken together, we found that these new capabilities resulted in powerful acceleration of our research, engineering, and product teams.
Over recent months, weâve seen meaningful gains in model performance on cybersecurity tasks, benefiting both developers and security professionals. In parallel, weâve been preparing strengthened cyber safeguardsâ to support defensive use and broader ecosystem resilience.
GPTâ5.3-Codex is the first model we classify as High capabilityâ for cybersecurity-related tasks under our Preparedness Frameworkâ , and the first weâve directly trained to identify software vulnerabilities. While we donât have definitive evidence it can automate cyber attacks end-to-end, weâre taking a precautionary approach and deploying our most comprehensive cybersecurity safety stack to date. Our mitigations include safety training, automated monitoring, trusted access for advanced capabilities, and enforcement pipelines including threat intelligence.
Because cybersecurity is inherently dual-use, weâre taking an evidence-based, iterative approach that accelerates defendersâ ability to find and fix vulnerabilities while slowing misuse. As part of this, weâre launching Trusted Access for Cyberâ , a pilot program to accelerate cyber defense research.
Weâre investing in ecosystem safeguards such as expanding the private beta of Aardvarkâ , our security research agent, as the first offering in our suite of Codex Security products and tools, and partnering with open-source maintainers to provide free codebase scanning for widely used projects such as Next.jsâwhere a security researcher used Codex to find vulnerabilities disclosedâ (opens in a new window) last week.
Building on our $1M Cybersecurity Grant Program launched in 2023, weâre also committing $10M in API credits to accelerate cyber defense with our most capable models, especially for open source software and critical infrastructure systems. Organizations engaged in good-faith security research can apply for API credits and support through our Cybersecurity Grant Programâ .
GPTâ5.3-Codex is available with paid ChatGPT plans, everywhere you can use Codex: the app, CLI, IDE extension and web. We are working to safely enable API access soon.
With this update, we are also now running GPTâ5.3-Codex 25% faster for Codex users, thanks to improvements in our infrastructure and inference stack, resulting in faster interactions and faster results.
GPTâ5.3-Codex was co-designed for, trained with, and served on NVIDIA GB200 NVL72 systems. We are grateful to NVIDIA for their partnership.
With GPTâ5.3-Codex, Codex is moving beyond writing code to using it as a tool to operate a computer and complete work end to end. By pushing the frontier of what a coding agent can do, weâre also unlocking a broader class of knowledge workâfrom building and deploying software to researching, analyzing, and executing complex tasks. What started as a focus on being the best coding agent has become the foundation for a more general collaborator on the computer, expanding both who can build and whatâs possible with Codex.
GPT-5.3-Codex (xhigh)
GPT-5.2-Codex (xhigh)
GPT-5.2 (xhigh)
SWE-Bench Pro (Public)
56.8%
56.4%
55.6%
Terminal-Bench 2.0
77.3%
64.0%
62.2%
OSWorld-Verified
64.7%
38.2%
37.9%
GDPval (wins or ties)
70.9%
\-
70.9% (high)
Cybersecurity Capture The Flag Challenges
77.6%
67.4%
67.7%
SWE-Lancer IC Diamond
81.4%
76.0%
74.6%