🌻 E41: Open AI Dev Day Edition

Smaller models will be the future.

Oct 02, 2024

🌼 Real Time API

OpenAI has just introduced its Realtime API, now in public beta, allowing developers to build low-latency, multimodal experiences—particularly in speech-to-speech applications.

What does this mean? You can now integrate ChatGPT’s voice controls directly into apps, enabling real-time, natural conversations. A perfect use case for call centers.

OpenAI showed a demo on a product called Wanderlust, a travel planning app originally showcased last year.

With the Realtime API, you can chat with the app, plan trips by speaking naturally, and even interrupt mid-sentence, creating a conversational flow that mirrors human dialogue.

But travel planning is just the tip of the iceberg. The Realtime API opens doors to a wide range of applications—from customer service to education and accessibility tools. Imagine voice-controlled apps that respond instantly and feel more like a conversation than a command.

“We focus on both startups and enterprises,” - OpenAI

Now, while the API isn’t exactly cheap

$0.06 per minute for audio input
$0.24 per minute for audio output

The Graphic Design and Call Center industries are already feeling the impact of Large Language Models (LLMs), and which now will be next, Legal?

🌼 Vision Fine Tune

This feature enables developers to tailor the model’s visual understanding capabilities using both images and text, opening up exciting new possibilities for a variety of industries.

Images In - Images Out.

It sounds like you’re looking for an easy-to-use solution where users can input URLs of images or videos from the internet, and then engage with the content using a question set in a conversational manner.

Autonomous vehicles, medical imaging, and visual search functionality—all of which rely heavily on precise visual data interpretation.

One standout early adopter is Grab, a leading Southeast Asian food delivery and rideshare company. Using vision fine-tuning, Grab was able to significantly improve its mapping services. With just 100 examples, the company saw a 20% increase in lane count accuracy and a 13% boost in speed limit sign localization. These impressive results demonstrate how small batches of visual training data can lead to dramatic improvements in AI-powered systems.

Absolutely! A flock of UI agents using vision fine-tuning and other features like retrieving information from websites could automate and streamline workflows in unprecedented ways.

Price is also cheap -

1 M Tokens for training will be free till 31st October, 2024 to fine tune gpt4o with images.
- After- that $25 per 1 M tokens to fine tune.
- Inference:
  - $3.75 per 1M input tokens.
  - $15 per 1M output tokens

We moved to beehive, If you want to continue - please follow the link musings.ai

Musings on AI

Discussion about this post