🌼 Real Time API
OpenAI has just introduced its Realtime API, now in public beta, allowing developers to build low-latency, multimodal experiences—particularly in speech-to-speech applications.
What does this mean? You can now integrate ChatGPT’s voice controls directly into apps, enabling real-time, natural conversations. A perfect use case for call centers.
OpenAI showed a demo on a product called Wanderlust, a travel planning app originally showcased last year.
With the Realtime API, you can chat with the app, plan trips by speaking naturally, and even interrupt mid-sentence, creating a conversational flow that mirrors human dialogue.
But travel planning is just the tip of the iceberg. The Realtime API opens doors to a wide range of applications—from customer service to education and accessibility tools. Imagine voice-controlled apps that respond instantly and feel more like a conversation than a command.
“We focus on both startups and enterprises,” - OpenAI
Now, while the API isn’t exactly cheap
$0.06 per minute for audio input
$0.24 per minute for audio output
The Graphic Design and Call Center industries are already feeling the impact of Large Language Models (LLMs), and which now will be next, Legal?
🌼 Vision Fine Tune
This feature enables developers to tailor the model’s visual understanding capabilities using both images and text, opening up exciting new possibilities for a variety of industries.
Images In - Images Out.
It sounds like you’re looking for an easy-to-use solution where users can input URLs of images or videos from the internet, and then engage with the content using a question set in a conversational manner.
Autonomous vehicles, medical imaging, and visual search functionality—all of which rely heavily on precise visual data interpretation.
One standout early adopter is Grab, a leading Southeast Asian food delivery and rideshare company. Using vision fine-tuning, Grab was able to significantly improve its mapping services. With just 100 examples, the company saw a 20% increase in lane count accuracy and a 13% boost in speed limit sign localization. These impressive results demonstrate how small batches of visual training data can lead to dramatic improvements in AI-powered systems.
Absolutely! A flock of UI agents using vision fine-tuning and other features like retrieving information from websites could automate and streamline workflows in unprecedented ways.
Price is also cheap -
1 M Tokens for training will be free till 31st October, 2024 to fine tune gpt4o with images.
After- that $25 per 1 M tokens to fine tune.
Inference:
$3.75 per 1M input tokens.
$15 per 1M output tokens