OpenAI rolls out Deep research in ChatGPT for expert-level report generation

OpenAI has introduced a new feature called Deep Research in ChatGPT, designed to tackle complex, multi-step internet research tasks. This advanced capability enables ChatGPT to complete in “tens of minutes what would take a human many hours,” according to the company.

What is Deep Research?

Deep Research is an autonomous agent that can independently search for, analyze, and synthesize hundreds of online sources to produce detailed, research analyst-level reports.

Powered by a version of the upcoming OpenAI o3 model, it is optimized for web browsing and data analysis. The feature leverages advanced reasoning to interpret and analyze vast amounts of text, images, and PDFs while adapting its approach based on the information it encounters.

OpenAI emphasizes that the ability to synthesize knowledge is a crucial step in the development of Artificial General Intelligence (AGI), which they envision as capable of producing novel scientific research.

Why OpenAI Built Deep Research

OpenAI developed Deep Research for professionals in fields such as finance, science, policy, and engineering, who need thorough, precise, and reliable research. It also serves consumers seeking hyper-personalized recommendations for purchases, such as cars, appliances, and furniture.

Every output generated by Deep Research is fully documented, featuring clear citations and a summary of its reasoning, making it easy to verify. The tool excels at uncovering niche, non-intuitive information that would typically require extensive browsing. OpenAI highlights that Deep Research saves valuable time by handling complex, time-consuming research tasks with a single query.

How Deep Research Works

Deep Research was trained using end-to-end reinforcement learning on challenging browsing and reasoning tasks across multiple domains. It can plan and execute multi-step research trajectories, adjusting its approach as needed based on new information.

The model can also browse user-uploaded files, plot graphs using Python, and embed images or graphs generated from websites into its responses. Furthermore, it cites specific sentences or passages from its sources, ensuring transparency.

On Humanity’s Last Exam, a rigorous evaluation testing AI across over 3,000 expert-level questions in more than 100 subjects, the model behind Deep Research achieved an impressive 26.6% accuracy. This score surpasses other models, such as GPT-4o (3.3%) and OpenAI’s o3-mini (13%), marking significant improvements in subjects like chemistry, humanities, social sciences, and mathematics.

It also set a new benchmark on the GAIA public leaderboard for real-world problem-solving, excelling in reasoning, web browsing, and tool-use proficiency.

How to Use Deep Research

To use Deep Research in ChatGPT, simply select the ‘Deep Research’ option in the message composer and enter your query. Users can request anything from a competitive analysis of streaming platforms to a personalized report on the best commuter bike. You can also attach files or spreadsheets for added context.

Once the process begins, a sidebar will display a summary of the steps taken and the sources used. Research typically takes between 5 to 30 minutes to complete, during which users can continue other tasks. Upon completion, the final report is delivered within the chat. OpenAI plans to add embedded images, data visualizations, and other analytical outputs in the coming weeks for enhanced clarity.

Deep Research vs. GPT-4

While GPT-4 excels in real-time, multimodal conversations, Deep Research is better suited for multi-faceted, domain-specific inquiries that require depth and detail.

For example, when asked, “What’s the average retirement age for NFL kickers?”, Deep Research provides a thorough analysis that includes statistical context, supporting examples, and factors influencing the longevity of kickers, rather than simply offering a single number.

Performance on Benchmarks

On GAIA, a public benchmark evaluating AI on real-world questions, Deep Research achieved state-of-the-art (SOTA) performance, topping the external leaderboard. It also demonstrated significant improvements in expert-level tasks, automating hours of manual research across various domains.

Limitations of Deep Research

Although Deep Research unlocks impressive capabilities, it does have some limitations. Occasionally, it may hallucinate facts or make incorrect inferences, although its error rate is lower than that of existing ChatGPT models.

It can also struggle to differentiate between authoritative sources and unreliable information, and may not always convey uncertainty accurately. At launch, there may be minor formatting errors in reports and citations. OpenAI expects these issues to improve with time and usage.

Access and Availability

Deep Research is available on ChatGPT’s web version and will be rolled out to mobile and desktop apps within the month. Due to its compute-intensive nature, access is currently limited to Pro users, with up to 100 queries per month.

  • Plus and Team users will gain access next, followed by Enterprise users. OpenAI is also working on expanding access to users in the United Kingdom, Switzerland, and the European Economic Area.
  • In the coming weeks, OpenAI plans to release a faster, more cost-effective version of Deep Research powered by a smaller model, offering higher query limits for all paid users.
What’s Next for OpenAI

Looking ahead, OpenAI envisions integrating Deep Research with Operator, a feature that allows for real-world actions. This combination will enable ChatGPT to perform increasingly sophisticated tasks by blending asynchronous online research with real-world execution.

Source


Related Post