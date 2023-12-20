wordpress blog stats
OpenAI to establish a Safety Advisory Group for reviewing frontier AI models

The preparedness team will assess and evaluate capabilities of frontier models, while also expected to identify potential risks of frontier models and connect it to the company’s Safety Systems team.

OpenAI is establishing a ‘Safety Advisory Group’ responsible for reviewing reports detailing the safety parameters of AI models, on the basis which the leadership will take decisions about models that are ready to be deployed or need further development. In its recent blogpost, the company elaborated on the responsibilities of its Preparedness team—set up in October 2023– for addressing risks posed by frontier AI models.

What are frontier AI models? As explained by OpenAI in another blogpost, frontier AI model is a phrase used for “highly capable foundation models that could possess dangerous capabilities sufficient to pose severe risks to public safety”. With unforeseen risks, these models can also pose regulatory challenges making it difficult to prevent the misuse of models that are already deployed and to restrict their capabilities form further expansion.

To address issues arising out of such advanced models, OpenAI set up the Preparedness team for assessing and evaluating capabilities of frontier models. The team is expected to “track, evaluate, forecast and protect against catastrophic risks” including—individualised persuasion; cybersecurity; chemical, biological, radiological, and nuclear (CBRN) threats; and autonomous replication and adaptation.

More about OpenAI’s Preparedness framework:

The Preparedness team is expected to identify potential risks of frontier models and connect it to the company’s Safety Systems team—focusing on existing products like ChatGPT—and the Superalignment team, which is responsible for reviewing super-intelligent models.

According to the Preparedness Framework (Beta), OpenAI will design a scorecard mechanism for evaluation of AI models. The blogpost explained:

“We will define risk thresholds that trigger baseline safety measures. We have defined thresholds for risk levels along the following initial tracked categories – cybersecurity, CBRN (chemical, biological, radiological, nuclear threats), persuasion, and model autonomy. We specify four safety risk levels, and only models with a post-mitigation score of “medium” or below can be deployed; only models with a post-mitigation score of “high” or below can be developed further. We will also implement additional security measures tailored to models with high or critical (pre-mitigation) levels of risk.”

In addition to regular safety drills, the company also said that they expect to audits conducted by third parties to evaluate models for mitigating safety risks.

Further, the Preparedness team will take care of technical work examining the capabilities and limits of frontier models and create reports based on the safety scores derived. These reports will form the basis of recommendations for model deployment and development, which will be shared with the Safety Advisory Group and the Leadership. The Board of Directors have a veto power over such decisions.

