Developers will tell you: AI agents are still too early. They’re expensive, unpredictable, and sometimes unreliable. They’re not wrong - The developers are feeling the heat in the production.
But there’s a spark of something promising in recent research. A glimpse of the future. And it’s suggesting that the future will be AGENTIC in next 1-2 years.
Today, let’s dive into a paper from Stanford University titled “ALIGNING AI AGENTS VIA INFORMATION-DIRECTED SAMPLING”
AI will change how we create. 83% of creators are using AI, much like the iPhone gave everyone access to cameras.
Hey, my name is Raahul. I have been trapped in the AI Industry for 10 years. As a developer, I thought I'd share my learning on LLM, recent AI innovations, and Agent, obviously, in simple language while I'm here.
That's why I write a newsletter called Musings On AI
It is a daily email that concisely tells readers about the LLM, AI Agents, AI business and culture in a conversational, witty way.
I own the 20K+ audience's attention multiple times in a week morning at 8.02 AM ET.It will be free always. This is today’s endtion, If you dont want the next editios, please feel free to UNSUBSCRIBE. No hard feelings :)
Why?
AI alignment is a critical challenge in developing superintelligent agents. The goal? Ensuring these agents align with human values and interests.
Human values are different, human interests are different, and Designing an agent that aligns perfectly with all human preferences? Nearly impossible.
Think about a scenario - you are in hospital, and there are some robotic agents are welcoming you.
The AI-powered agents analyze your clinical data, past health records, maybe even credit details. They’re efficient. But then, a human nurse approaches, asking how you’re feeling today, bringing a sense of warmth and understanding that’s hard to replace.
To ensure AI agents make decisions that truly serve us, they must integrate human preferences into their actions.But here’s the catch - Many real world problems are uncertain and partially observable.
Enter Stanford’s New Approach.
A novel approach to alignment - as published by Stanford Univ (see reference below) - applying a class of bandit alignment problems, where an AI agent must balance exploration of an environment with querying human preferences to maximize long-term rewards.
In traditional alignment, methods like Thompson Sampling or “explore-then-exploit” strategies attempt to manage this balance. But these often fall short in dynamic environments, accumulating high regret (in other words, lots of missed opportunities to learn and adapt).
Stanford’s solution? Information-Directed Sampling (IDS).
IDS takes alignment to a new level by minimizing regret and optimizing learning. It’s an algorithm that chooses actions based on two goals: maximizing knowledge gain while staying reward-focused. The magic lies in its information ratio—calculating each step to weigh immediate benefits against potential long-term knowledge.
This allows IDS to handle alignment with sublinear regret, meaning it improves continuously without sacrificing alignment with human goals.
Tests show IDS consistently outperforms traditional methods, making it a scalable, efficient approach for aligning AI in uncertain, ever-changing environments.
Am excited - want to see it in the prod soon.
🌸 From The Agent Community
🌼GUI Agents with Foundation Models: A Comprehensive Survey
In this survey, researchers have consolidated cutting-edge work on (M)LLM-based GUI agents, focusing on three main areas:
Datasets and Benchmarks: First, they dive into the key datasets and benchmarks that serve as the foundation for training and evaluating these agents. These datasets allow agents to learn GUI interaction patterns and user expectations, creating a baseline for improvement.
Frameworks and Taxonomy: Next, the survey presents a unified framework, capturing the essential components that researchers use across studies. A detailed taxonomy further categorizes these components, mapping out the variety of approaches and helping future research build on a shared understanding.
Commercial Applications: The practical impact of these advances can already be seen in industries ranging from customer service automation to software testing and beyond. Companies are using (M)LLM-based agents to handle user instructions and perform tasks in real time, reducing the need for manual intervention.
🌸Choice Cuts
🌼A new adaptive gradient method named ADOPT.
🌼 Do websites go away with AI agents?
As AI agents become more common, we could see a shift toward a “dual-use web” where websites serve both humans and bots. While websites remain essential for human users due to the effectiveness of visual UIs, AI agents might increasingly interact directly with data through APIs or automated browser interfaces.
Companies may prioritize their own web interfaces over accommodating third-party AI assistants, leading to a blend of human-friendly UIs and agent-friendly APIs that support both types of users in parallel.
🌼 Hugging Face introduced the fully open sourced small high performance SmolLM2 language model.SmolLM2 pushes the limits for language models under 2B parameters with three optimized sizes: 135M, 360M, and 1.7B parameters.
The future of on-device and in-browser models is open, and it's incredibly exciting!
🌼 The developers are looking for a stable common chunking library, and we found it.
🌼 Everything I've learned so far about running local LLMs
🌸 Podcasts
There’s a lot more I could write about but I figure very few people will read this far anyways. If you did, you’re amazing and I appreciate you!
Love MusingsOnAI? Tell your friends!
📮Want to Advertise with us?
If your company is interested in reaching an audience of AI professionals and decision-makers, reach us.
If you have any comments or feedback, just respond to this email!
Thanks for reading, Let’s explore the world together!
Raahul