Here is the rewritten content:
Advanced Synthetic Voices Can Make Scams More Convincing
Models Have Been Tuned for Accuracy, Reliability, and Realism
OpenAI is expanding its stable of AI voices to include agentic models, enabling two-step processes such as asking an AI to buy plane tickets or change a customer’s order. These models include Gpt-4o-transcribe and Gpt-4o-mini-transcribe, both of which are speech-to-text models, as well as Gpt-4o-mini-tts, a text-to-speech model.
Developer Access and Integration
Developers can access these models on the OpenAI API and integrate them with the Agents SDK. Adding text-to-speech and speech-to-text to the API enables a variety of AI applications, including agentic tools.
Potential Scam Risks
The company wants to enable "deeper, more intuitive interactions with agents beyond just text," but adding flexibility and greater autonomy in voice models raises the possibility of more convincing scam bots. However, OpenAI is continuing to engage in conversations with policymakers, researchers, developers, and creatives around the challenges and opportunities synthetic voices can present.
Custom Voices and Personalized Experiences
OpenAI is also pursuing ways to use video in agentic AI experiences. Developers will be able to bring "custom voices" for "personalized experiences in ways that align with our safety standards." This could lead to more convincing and engaging interactions.
Accurate Transcription and Realistic Voices
The new speech-to-text and text-to-speech audio tools have been tuned for accuracy and reliability, particularly in conversations including accents, noisy environments, and varying speech speeds. These models are intended for customer call centers or transcribing meetings. They can also be instructed to speak in specific ways, from intentionally specific to dramatic or cheerful.
Potential Use Cases
OpenAI envisions some of these AI models being used for "expressive narration for creative storytelling experiences." This could include theme parks or theatrical events. The company suggests example voices such as "bedtime story," "surfer," "true crime buff," and "medieval knight."
Conclusion
In conclusion, OpenAI’s new agentic models offer a range of possibilities for developers to create more engaging and realistic interactions. However, it is essential to consider the potential risks and challenges associated with these models, including the possibility of more convincing scam bots.
FAQs
Q: What are agentic models?
A: Agentic models are a type of AI that enables two-step processes, such as asking an AI to buy plane tickets or change a customer’s order.
Q: What are the new speech-to-text and text-to-speech models?
A: The new models include Gpt-4o-transcribe and Gpt-4o-mini-transcribe, both of which are speech-to-text models, as well as Gpt-4o-mini-tts, a text-to-speech model.
Q: Can I use these models for personalized experiences?
A: Yes, developers will be able to bring "custom voices" for "personalized experiences in ways that align with our safety standards."
Q: Can I use these models for creative storytelling experiences?
A: Yes, OpenAI envisions these AI models being used for "expressive narration for creative storytelling experiences," such as theme parks or theatrical events.







