The AI Agent Landscape: What Data Scientists Should Know (And Expect)
The ultimate guide to understanding and getting started with AI agents
👋 Hey, it’s Andres. If you are new here, welcome to my newsletter exploring the journey of becoming and thriving as a data scientist.
Every week I share practical insights to stay relevant, and advance your career. Don’t forget to subscribe to not miss future articles, it’s free!
We hear the word agent thrown around so often in tech these days that it’s easy to feel like we already know what it means, even if we’ve never read a single research paper about it or tried building one ourselves.
And maybe that’s fine for the average person.
But as data scientists, we can’t afford to stop there.
And yeah, I know—agents are the newest buzzword. But just because there’s a lot of hype around them doesn’t mean they’re not powerful or that they’re not already shaping how our field evolves. If we don’t take the time to truly understand how they work, we’ll miss the chance to be the ones building them, deploying them, and redefining how our work gets done.
Believe me, it wasn’t until I started trying to build my own AI agents to augment/automate my Data Science work that I realized just how shallow my understanding really was.
So if you’re hoping to take that next step and actually get what agents are, how they work, and why they matter, this article is for you.
Here are some of the things we’ll cover:
What AI agents actually are (and how they differ from LLMs)
The core components that make agents work
A high-level overview of today’s top agent frameworks
Why this matters for data scientists (and how it’s already changing workflows)
👨💻 Also, a glimpse into the project I’m building to help data scientists future-proof their careers with AI (more on that at the end of the article)
Alright, let’s get to it!
From LLMs to Agents: A Conceptual Jump
Let’s start by making one thing clear: an agent is not just a more advanced LLM, rather, think of an agent as a system that uses an LLM (or multiple) to take actions toward a goal over multiple steps.
Here is a break down to help paint a better picture:
LLMs (like Llama 3 or GPT-4) are stateless, single-step predictors, they generate one-off responses based on input, with no memory (by default) or planning unless you manually simulate it.
Agents are built on top of LLMs. They add a control layer that can:
Use memory
Call tools or APIs
Make decisions across multiple steps
Maintain state and pursue goals
Agents orchestrate multiple LLM calls, use external tools, and retain context over time. They don't exist without LLMs but represent a more powerful abstraction built using them.
Anatomy of an Agent
At its core, an agent is just a system designed to pursue a goal, but the way it does that is what sets it apart.
Here’s the typical breakdown:
Goal: Every agent starts with an objective. This could be as simple as “summarize this dataset” or as open-ended as “analyze customer churn.”
Reasoning loop: Unlike a single LLM prompt, agents run in loops. They decide what to do, act, observe the result, then decide again. It’s iterative.
Tool use & memory: Agents can call APIs, run code, use databases, and retain context across steps. This makes them far more dynamic than a basic chatbot.
Reflection: Some agents even evaluate their own progress, adjusting plans along the way. Think: “Did this work? If not, what should I try next?”
One pattern you’ll often see in open-source agents is ReAct, short for Reasoning + Acting. It’s a simple but powerful structure where the agent reasons about the problem, takes an action, observes the outcome, and repeats.
The key idea is this: the magic isn’t just in the model, it’s in the loop.
Here is a good article that walks through the ReAct framework.
MCP vs ACP vs A2A: What’s the Difference?
As agents gain traction, different standards are emerging for how to structure them. Three you’ll likely come across are:
MCP (Model–Context–Prompt)
An open standard from Anthropic that acts like a USB‑C port for LLMs, one server, many tools. It standardizes how models access external data and APIs, making tool-assisted prompting and API chains easier to scale. But it’s not a full agent framework, no planning loop, no autonomy. Just a clean way to connect models to the outside world. More information via the official documentation.
ACP (Agent–Context–Prompt)
Introduced by OpenAgents, ACP formalizes agents a bit more. The “agent” holds its own identity and capabilities, context is continuously updated, and the prompt adapts over time. This is a step toward goal-driven behavior. Here is the official documentation.
A2A (Agent-to-Agent)
This is about multi-agent collaboration. Instead of a single agent doing everything, multiple agents specialize and communicate to complete a task. Useful for delegation, parallelism, or complex workflows that require different roles. More information via the official documentation.
In short:
MCP standardizes how LLMs access external tools and data.
ACP builds more dynamic, goal-aware agents.
A2A coordinates teams of agents for broader tasks.
Hopefully this will help you when choosing the right approach for your use case. If not, this article does a great job at doing a thorough comparison.
Landscape: Tools & SDKs
If you’ve been wondering where to even start with agents, you’re not alone. The landscape is expanding fast, and it’s easy to get lost in the buzzwords.
So before diving into building anything, I think it’s helpful to zoom out.
Below is a high-level look at the most popular open-source agent frameworks right now, how they differ, and what they’re actually good for. These frameworks support both autonomous agents (e.g., Auto-GPT) and orchestrators (e.g., LangChain), catering to diverse needs in the tech industry:
If you want more info, here is a great article comparing the different agent frameworks out there.
Why this matters for 👨💻 Data Scientists
As data scientists, we spend a huge portion of our time not on modeling, but on navigating complexity:
Figuring out where to start
Exploring edge cases
Debugging pipelines
Refining our analysis based on vague stakeholder questions
Shall I continue?
Agents can reshape that workflow, not by doing the thinking for us, but by handling the repetitive loops we constantly find ourselves in.
Imagine running an initial EDA, having the agent flag unusual distributions, suggest relevant breakdowns, and automatically re-run segments of the analysis, without you having to re-prompt it each time.
Or having it track your working assumptions, generate tests, or propose hypotheses worth validating.
Over the past few months, I’ve started building some of this into my own workflow and I can say with confidence that it has helped me 10x my analysis workflow.
👉 If you’re interested in building systems that use AI to streamline your workflow, from exploratory analysis to reporting and beyond, I’m building a course designed especially for data scientists (and data professionals). Join the waitlist, I’ll be sharing more soon.
That’s the true value of agents for us Data Scientists: faster iteration, tighter feedback loops, and more room to focus on the hard questions.
Real-World example: LAMBDA, a Data Agent built for analysts
Let’s ground all of this with a concrete example.
In mid-2024, researchers released a framework called LAMBDA (Large Model-Based Data Agent). It’s one of the clearest demonstrations of what agentic workflows can look like in real data analysis settings.
🧠 What LAMBDA actually does

LAMBDA breaks the agent’s responsibilities into two modular roles:
The Programmer agent: Generates code (Pandas, SQL, etc.) based on a user query or analysis goal.
The Inspector agent: Reviews the code, catches bugs, improves formatting, and suggests refinements.
Together, they operate in a feedback loop: the Programmer iterates, the Inspector checks the work, and errors are passed back into the loop until the output is functional and correct.
💡 LAMBDA includes a human-in-the-loop interface, so the user can intervene, inspect, or modify any step of the process.
🔍 How it works (in practice)
🤔 Why this matters
This setup mimics what many of us already do: write some code, double-check it, refine it. LAMBDA just formalizes that as an agentic process, with clearer roles, tighter feedback loops, and automation of the tedious parts.
Here’s why it’s valuable to data scientists:
The role separation maps to common workflows (writer/reviewer).
It reduces time spent debugging and rewriting similar logic.
It gives you more control, not less, you decide when to step in.
And most importantly, it shows how agent frameworks can be practical, not just theoretical.
Hopefully this gives you some inspiration for your future work!
⭐️ Putting it all together
By now, you’ve seen how to think about agents differently:
Not as some special LLM capability, but as a layered system on top of LLMs
Built using three primitives: memory, planning, and tool use
And part of a broader stack that spans from LLM calls all the way to user-facing products
But this isn’t just about frameworks or toolchains.
If you're serious about building with agents, here's what matters more:
Understand the anatomy
You need to reason about memory, planning, and tool use independently. Most agent failures come from skipping this layer, gluing things together without understanding what each component is responsible for.Work backward from user intent
An agent isn’t a product. It’s a workflow engine. Start by designing the ideal user experience, and let that inform whether you need autonomy, multi-step reasoning, or just a well-scoped LLM call.Think like a systems designer, not a prompt engineer
LLMs are powerful, but the leverage comes from building structured loops around them:
👉 Validation, fallbacks, retry logic, monitoring, constraints.Start small, but with purpose
Pick a use case where agents can unlock clear leverage, faster iteration, reduced manual steps, better decision support. And keep humans in the loop, especially early on.
The tooling will keep evolving. The wrappers will get more complex.
But if you focus on the fundamentals, you’ll be in a position to actually ship useful AI systems, long after the hype dies down.
A couple of great resources:
💼 Job searching? Applio helps your resume standout and land more interviews.
🤖 Struggling to keep up with AI/ML? Neural Pulse is a 5-minute, human-curated newsletter delivering the best in AI, ML, and data science.
🤝 Want to connect? Let’s connect on LinkedIn, I share lots of career bite-size advice every week.
Thank you for reading! I hope this guide helps you start upskilling (and don’t forget to join the waitlist for the “Future Proof AI“ course)
See you next week!
- Andres
Before you go, please hit the like ❤️ button at the bottom of this email to help support me. It truly makes a difference!