πΌ Real Time API
OpenAI has just introduced its Realtime API, now in public beta, allowing developers to build low-latency, multimodal experiencesβparticularly in speech-to-speech applications.
What does this mean? You can now integrate ChatGPTβs voice controls directly into apps, enabling real-time, natural conversations. A perfect use case for call centers.
OpenAI showed a demo on a product called Wanderlust, a travel planning app originally showcased last year.
With the Realtime API, you can chat with the app, plan trips by speaking naturally, and even interrupt mid-sentence, creating a conversational flow that mirrors human dialogue.
But travel planning is just the tip of the iceberg. The Realtime API opens doors to a wide range of applicationsβfrom customer service to education and accessibility tools. Imagine voice-controlled apps that respond instantly and feel more like a conversation than a command.
βWe focus on both startups and enterprises,β - OpenAI
Now, while the API isnβt exactly cheap
$0.06 per minute for audio input
$0.24 per minute for audio output
The Graphic Design and Call Center industries are already feeling the impact of Large Language Models (LLMs), and which now will be next, Legal?
πΌ Vision Fine Tune
This feature enables developers to tailor the modelβs visual understanding capabilities using both images and text, opening up exciting new possibilities for a variety of industries.
Images In - Images Out.
It sounds like youβre looking for an easy-to-use solution where users can input URLs of images or videos from the internet, and then engage with the content using a question set in a conversational manner.
Autonomous vehicles, medical imaging, and visual search functionalityβall of which rely heavily on precise visual data interpretation.
One standout early adopter is Grab, a leading Southeast Asian food delivery and rideshare company. Using vision fine-tuning, Grab was able to significantly improve its mapping services. With just 100 examples, the company saw a 20% increase in lane count accuracy and a 13% boost in speed limit sign localization. These impressive results demonstrate how small batches of visual training data can lead to dramatic improvements in AI-powered systems.
Absolutely! A flock of UI agents using vision fine-tuning and other features like retrieving information from websites could automate and streamline workflows in unprecedented ways.
Price is also cheap -
1 M Tokens for training will be free till 31st October, 2024 to fine tune gpt4o with images.
After- that $25 per 1 M tokens to fine tune.
Inference:
$3.75 per 1M input tokens.
$15 per 1M output tokens