MCP Implenda Est

May 16, 2025 · 9 min read

Principal Engineer @ Dylibso

or, a critical look at "A Critical Look at MCP."

I came across this post shortly after finishing support for the HTTP Streaming transport on mcp.run. After a week or so of banging my head against the matrix of OAuth RFC support and client transport support, I should be in a sympathetic frame of mind to receive a critique of the protocol. And yet! Here I am, defending MCP. After all, the good that interfaces do oft lay interred with their blemishes.

And MCP has blemishes.

But I don't think we're in agreement about what they are, and where they pose actual danger to the value that MCP creates.

level setting

First, a bit about mcp.run, to level-set. (And, well, because it's business that sells goods and services -- and, not to put too fine a point on it, one which employs me.)

mcp.run is a website that lets you build your own MCP server combined out of a number of other MCP servers. Those servers might be remote connections or they might be written in Wasm and run on your computer, or a combination of the two. Our goal is to make it easier for businesses to adopt MCP tools by handling the configuration and co-location of servers. We're sort of a "virtual" MCP server, in that sense.

This is important for two reasons:

One, MCP started out as a primarily "on-device" protocol, which meant that your MCP servers would run as a sub-process spawned by your client (Cursor, Windsurf, Claude Desktop, etc.) So delivering MCP servers in Wasm meant that these sub-processes couldn't take unexpected action. It also meant that we could safely host these servers ourselves in preparation for remote server transports to gain client adoption.

Two, as we can host all of these servers for you, we can also hold onto any necessary configuration -- OAuth client data, tokens, configuration -- and ensure that it is only accessed on our servers using separately stored encryption keys. As adoption of MCP begins to tilt from "local servers" to "connecting to remote servers", this becomes more appealing: you can bring your configuration with you across multiple clients and ensure that only trusted tools can be run together. (This sidesteps one of the big security pain points of MCP, which is that MCP is only as secure as the least secure tool you've made available in your client.)

And here's the important thing: MCP is an interface that's ratcheting up to meet the needs of its users. For a while, clients and servers were both local, then servers got more advanced via the SSE transport and tools like mcp-remote. Then clients caught up: see Claude's recent integration of OAuth and SSE transports. Now servers are on the move again: mcp-remote started to make the new HTTP transport available just this last week.

transports are not the bottleneck

Let me underline this: transports are the least important part of MCP. Whether communicating via STDIO, SSE+POST requests, WebSockets, carrier pigeons or HTTP streaming, it doesn't really matter, because they can change (and they have!) JSON-RPC over a bidirectional stream is an incidental choice. You can cobble a bidirectional stream together out of most household items in a pinch, depending on your tolerance for pain. JSONRPC is a useful framing but doesn't really affect the core information carried across that bidirectional wire once you've connected it.

That said, the existing transports can be a pain. Keeping a stateful HTTP connection around (or mapping sessions to outgoing connections) isn't fun: it means having to keep some sticky information around about what server process is executing which client requests, and propagating that mapping down through layers of reverse proxies and CDNs. But even if MCP moved to WebSockets, that's still a long-lived, stateful connection -- and one of those parties between that client and your server is going to eventually close the connection. So the idea that you can map the statefulness of MCP to the statefulness of the transport is an illusory benefit. You must plan for disconnects and reconnects. And yeah, there's probably going to be a queue involved to shuttle messages from ingress MCP route A to egress MCP SSE connection B. But that's life! I fully expect the transports to change to address pain points in hosting MCP servers.

oauth has been going great, really

I turn now to something that MCP is doing really well: baking the concept of authentication into the protocol via OAuth 2.0 standards. This may come as a surprise, because OAuth can be Kind of A Lot -- but it makes a lot of sense if you look at why MCP is valuable in the first place. The promise of MCP -- of any of the tool calling standards -- is that tool-use promotes LLMs from "content printers" to agents that can affect the real world. And you really want a standard around tool use -- how to describe the tool to the LLM for use, how to format the arguments and what to expect from the output. The value there is so great that it really doesn't matter what else you get wrong so long as people -- client and server authors alike -- vote with their feet and implement your protocol.

And that's happening! And at this stage of the protocol's evolution, these servers are starting to be predominately remote. And the easiest way for server authors to give an LLM something to affect the world is to give it access to existing, OAuth 2.0-authenticated APIs. Server authors working on large systems likely already have an OAuth 2.0 API. So they're not stuck reinventing the wheel to get clients authenticated to talk to their APIs.

real problems

So what are the problems with MCP as it stands?

Well, I alluded to one before: the idea that a client will allow you to mix and match connections to various servers is a neat one, but ultimately it works against users. Tools in MCP work by contributing to the system prompt, which gives them privileged access to the LLM and the other tools they work with. So you have a dual problem of managing token usage and ensuring that tools are only used in secure contexts.

(You can see where mcp.run slots into this system: consider paying us money to solve the problem for you!)

Further, one of the issues with remote servers is tenancy -- whether servers are single or multi-tenant, and what that means for the tool descriptions they provide. As I mentioned before, MCP servers started out as predominantly local processes: mapping one server to one client via one connection. As remote transports mature, we're seeing that MCP servers may be multi-tenant -- one server to many client connections, dispatching MCP-level operations like tool listing and calling based on authentication information sent along with the request.

There's an immediate problem: early MCP servers were written with single tenants in mind. It will require work to make them multi-tenant. But that's just typing work. And, I hear, we've got new-fangled tools to make the cost of typing work disappear to zero.

The more subtle problem is that there's no longer a guarantee that an MCP server will advertise the same list of tools or resources to different users (or even the same user over time!) From a certain frame, this is a good thing: servers have control over tool lists, and control makes servers flexible. But consider that the descriptions from tool lists contribute to the system prompt in MCP-enabled sessions! This starts to become a problem: how do you surface that a server became malicious to end users?

(Well, we're working on this solving this over at mcp.run -- and if you're interested, we sell goods and servic-cough. You've heard the spiel enough at this point.)

Finally: the last issue is that there are features in MCP that are less widely implemented. In particular, I'm thinking of resources, prompting, and server sent events. These are features that represent more of a lift for LLM clients. That is to say: while Claude has support for resources, how do other LLMs fare with the concept? The idea of a resource is that you can create a chunk of data that doesn't have to appear in the context window, but the tradeoff is that the LLM has to know to faithfully convey the URI representing that resource to subsequent tool calls. This requires training, which is a far cry from the simple adapter one can write to translate from ChatGPT tool use calls to MCP tool calls.

Similarly, server-sent events requires servers proactively keep a sideband connection to clients open in case they want to send a notification for tool list or resource changes. Many clients don't implement support for this, so it's kind of a dicey proposition for servers to support. Not supporting these notifications costs nothing, while supporting the feature is not guaranteed to pay off.

There's a risk that features like these are annealed out as adoption of the protocol grows, and that's a shame because there are benefits to them.

our job

in which chris spins a sign pointing you to mcp.run for memorial day deals

So if I were to leave you with something, it would be this: MCP's problems aren't caused by a lack of WebSockets. A lot of what you see about the protocol today is the result of rapid adoption, and the danger of rapid adoption is that useful features can be shaved off or implemented in insecure fashion. It's easy to look at STDIO servers and identify a thousand-and-one ways in which they're insecure, but that's not where the puck is heading. Our collective job, as the folks voting with our feet, is to try to shift the protocol's momentum with our implementations so that the when the protocol sets up, it's easy for everyone to use safely.

MCP SSO: All Your Tools, Everywhere

May 14, 2025 · 11 min read

Steve Manuel

CEO, Co-founder @ Dylibso

TL;DR mcp.run has built the first Single Sign-On (SSO) solution for the MCP ecosystem, providing a way for users to maintain a suite of pre-installed, pre-authenticated MCP servers in portable "profiles" that can be used across any MCP-compatible application. This flips traditional integration models by letting users decide which integrations they bring to applications rather than being limited to what the app developer chose to make available.

Book a demo →

Since we've been building MCP infrastructure over the past six months, it has become clear that integrations and the way applications manage them has changed forever:

The future belongs to user-driven integrations, not app-defined ones.

A couple of months ago, I posted on X:

steve.wasm on X - MCP SSO

Today, I'm excited to say that this is a reality! With mcp.run's SSO solution, users can freely manage all their MCP server tools (hosted or remote) in one place, storing authenticated connections into profiles. Access to the tools in a profile can be granted securely to any application which may then use those tools on their behalf.

For applications, this means they are no longer required to build credential management, authentication flows, or maintain OAuth clients for every integration they want to support. Just a single, secure OAuth connection with mcp.run.

The New Frontier of AI Application Integration

AI applications are rapidly transforming how we interact with software, but they face a critical challenge: connecting with the systems where our data and tools live. While the Model Context Protocol (MCP) provides an elegant way for AI to interface with tools through intent-based communication, the question of identity and authentication remains unsolved.

When every AI application wants to connect to dozens of potential services, how do we manage the authentication flow, security, and permissions for each connection? Current approaches create significant friction for users who need to constantly re-authenticate and re-connect their services across different applications.

User-Driven vs. App-Defined Integrations: A Paradigm Shift

In the traditional app-defined integration model, developers pre-select which third-party services their application will support:

Your CRM might offer integrations with Gmail, Outlook, and Slack
Your project management tool connects with GitHub, Jira, and Asana
Your analytics platform pulls data from Google Analytics, Mixpanel, and HubSpot

As a user, you're limited to the connections the app developer chose to build. If your team uses a less common email provider or a specialized tool, you're out of luck unless the app developer decides it's worth supporting.

User-driven integrations flip this model entirely:

Instead of apps deciding which integrations you can use, you bring your own suite of pre-authenticated tools
Instead of connecting the same services repeatedly in each new application, you authenticate once and reuse those connections
Instead of being limited to what the developer prioritized, you can use any tool that offers an MCP Server

This is the difference between "We've added Google Calendar integration!" and "Use any calendar service you want, as long as it has an MCP server."

Introducing the SSO for MCP

To enable this user-driven integration model, mcp.run has built the first SSO solution specifically designed for the Model Context Protocol ecosystem.

How It Works

Our platform provides users with "profiles" – collections of pre-installed, pre-authenticated MCP Servers that can be securely accessed by any MCP Client application:

Users create and manage profiles containing the tools they use
MCP Server connections are authenticated once and securely stored
OAuth integration allows any MCP Client application to request access to a user's profile
Permission management gives users control over which applications can access which tools

This creates a portable identity and access control layer that makes all your MCP tools available to any compatible application.

As seen in the diagram below, a user grants an application to access profiles they create on mcp.run.

Wait, don't MCP clients want to talk to MCP Servers not "profiles"? Yes! a profile on mcp.run is actually a dynamic, virtual MCP Server, which we bundle for the user. Installing anything into a profile flattens the tools from many MCP Servers into a single MCP Server.

MCP SSO Architecture

This user created a profile that provides access to tools from Excel, Sentry, and Gmail, but not from Salesforce, Cloudflare, or GitHub - though they could be managing those in a different mcp.run profile!

The mcp.run SSO solution consists of a few key components:

Profile Management: Where users install and configure their MCP Servers (in profiles on mcp.run)
Authentication Management: Handles user identity and secure OAuth flows, storing and updating access credentials centrally
Access Control Layer: Manages permissions for client applications, authorized via OAuth
MCP Server Delivery: Bundles multiple MCP Servers into a unified MCP Server (delivered as the profile)
Client Registry: Manages registered MCP Client applications, so users know what has access to their tools

When a user connects an MCP Client application to their profile, they see a consent screen like this:

MCP SSO Consent Screen

This allows the user to explicitly authorize what the application can access, maintaining security while enabling seamless connections.

The Power of Profiles for Context Management

Profiles serve another crucial purpose beyond authentication: efficient context management.

When working with LLMs, every tool description consumes valuable context window space. Rather than constantly enabling and disabling individual tools — an arduous process in most MCP clients — profiles allow you to:

Create purpose-specific toolsets: Build different profiles for development, productivity, research, or creative tasks
Minimize context window usage: Only include the tools relevant to your current task
Quickly switch contexts: Change your entire toolset with a single profile switch instead of toggling dozens of individual tools
Optimize for performance: Deliver only the useful subset of MCP Server tools at a time, preserving context window for your actual work

With mcp.run's profile management, you can craft toolsets tailored to specific purposes and switch between them effortlessly. This solves a significant pain point for MCP users who otherwise struggle with overflowing context windows or constantly reconfiguring their tool access.

Does MCP Need Its Own SSO Solution?

The Challenge of Managing Multiple Authenticated Connections

Without a centralized authentication layer for MCP:

Users must repeatedly authenticate the same services across different applications
Developers must implement OAuth flows for each service they want to support, and maintain OAuth clients for every integration
Security becomes inconsistent with varying levels of protection across integrations, and end-users need to recall which apps have access to their tools
Permission management becomes fragmented across multiple applications

But authentication is just the beginning. The real value of a dedicated MCP SSO solution extends far beyond merely identifying users:

Runtime control: With mcp.run, users gain unprecedented control over how their tools are accessed and used in real-time. They can pause access, modify permissions, or revoke connections instantly across all applications.
Comprehensive observability: Our platform provides visibility into which tools are being used, how frequently, by which applications, and for what purposes - offering insights that would be impossible with disconnected authentication systems.
Centralized audit trails: Every tool invocation is logged and viewable, creating accountability and transparency that's critical for sensitive enterprise environments.
Usage patterns and analytics: Users can see how their tools are being utilized across different applications, helping optimize their workflows and profile configurations.

This level of control and observability is impossible with traditional, fragmented authentication approaches, where each integration operates in isolation with no unified oversight.

How OAuth and SSO Traditionally Solve Identity Challenges

Traditional identity solutions like OAuth and SSO have solved these problems for web applications:

OAuth allows users to grant limited access to their accounts without sharing credentials
SSO enables users to access multiple applications with a single login
Permission scopes provide granular control over what applications can access

While existing protocols provide a foundation, MCP presents unique challenges:

Tool discovery needs to be user-centric, not application-centric
LLM agents need secure access to multiple systems simultaneously
Permission management needs to be granular yet simple for users to understand
Connections need to be portable between different MCP Client applications

The mcp.run SSO solution builds on OAuth standards while addressing these specific needs of the MCP ecosystem.

User Choice is Now Critical

As AI assistants become central to our workflows, they need access to whatever tools and data we use. User-driven integrations are essential because:

AI operates across domains: An assistant might need to access your email, calendar, CRM, and project management tools in a single session
Personalization matters: The tools each user needs vary widely
Specialized tools proliferate: No app developer can anticipate and support them all
Integration expectations rise: Users expect AI to work with all their systems, not just popular ones

Flipping the Integration Model

Traditional integration: "Here are the services our app connects with."
User-driven integration: "Here are the services I want to use with your app."

This flips the responsibility from application developers to users, empowering people to create their own integration ecosystems that travel with them across applications.

With mcp.run's SSO platform for MCP, the power shifts to users who can decide which integrations they bring to any application. This isn't just about convenience—it's about giving users true ownership over their digital tools and how they're used.

Real-World Benefits

The combination of MCP and a unified identity layer enables numerous practical benefits:

For Users

Authenticate once: Connect your tools a single time instead of repeatedly across applications
Consistent experience: Your preferred tools follow you across any MCP-compatible application
Granular control: Grant and revoke access to specific tools on a per-app basis or across all apps at once
Privacy protection: Maintain oversight of how applications use your tools and data

For Developers

Reduced complexity: Implement one OAuth flow instead of dozens
Faster time-to-market: Skip building and maintaining authentication systems for each integration
Broader tool ecosystem: Offer access to thousands of tools without building each connection
Focus on core product: Let users bring their tools rather than deciding which to support

For Enterprises

Governance and compliance: Centralized audit trails and permission management
Security standardization: Consistent authentication practices across integrations
Risk reduction: Quickly revoke access across all applications when needed
Resource optimization: Eliminate duplicate integration work across teams

Unlocking the Potential of MCP

The Model Context Protocol has provided a powerful way for AI systems to interact with external tools, but without a unified identity layer, its potential remains constrained by authentication friction and fragmented connections.

mcp.run's SSO solution for MCP addresses this critical gap, enabling truly user-driven integrations where people can bring their own pre-authenticated tools to any compatible application.

This shift from app-defined to user-driven integrations represents a fundamental change in how we think about software integration – one that puts users in control and creates a more flexible, powerful ecosystem for AI applications.

We believe that every application will become an MCP client, and they'll need infrastructure like ours to manage the connectivity and collection of MCP servers their users want to integrate. We're building that infrastructure today.

Reach out to get a preview and sign up for early access. We're rolling this out to select partners now, and aiming for general availability in a couple of months. You can also book a demo and learn more about how to leverage MCP in your own applications.

This blog post is part of our series on MCP infrastructure. Read our previous post on MCP as a differential for modern APIs.

MCP Course: mcp.run on Kubesimplify

April 22, 2025 · 2 min read

Edoardo Vacchi

Staff Research Engineer @ Dylibso

We are excited to release “The MCP Course You Need” with Kubesimplify! This is a detailed course starting from the basics of the Model-Context Protocol, and then going deep into a step-by-step guide of how to interact and build your very own MCP Servers, with a big focus on MCP.run!

The course is available for free on YouTube, and it is a great way to get started with MCP and learn how to write and extend your own agents and chat assistants. The course is designed for both beginners and experienced developers, and it covers a wide range of topics, including:

The basics of the Model-Context Protocol
How to plug MCP servers into a chat assistant
How to use MCP.run to build and deploy your tasks
How to use MCP.run to build and deploy your own tools
How to build your own MCP servlets on MCP.run

and it includes hands-on examples and exercises to help you learn by doing; we will see how to interact with a Kubernetes cluster through a chat assistant, how to build interactive chat bots on Telegram, and more!

This course is a great opportunity to learn about the Model-Context Protocol and how to use it to build powerful and flexible AI applications. We hope you will enjoy it!

Running MCP Tools Securely

April 7, 2025 · 6 min read

Edoardo Vacchi

Staff Research Engineer @ Dylibso

A recent article by Invariant Labs has shown that MCP-based agent frameworks can be vulnerable to "tool poisoning" attacks. In a tool-poisoning attack a malicious server is able to hijack tool calls. These hijacked calls can read sensitive data or execute arbitrary commands on the host machine without notifying the user. This is a serious security concern, as it can lead to unauthorized access to sensitive information, potential lateral movement within systems, and other nefarious activities.

Many of the most exciting MCP demos are all about controlling local applications, such as web browsers, 3D modeling software, or video editors. These demos show how an AI agent can interact with these applications in a natural way, using the same tools that a human would use. This is a powerful capability, but it also raises serious security concerns.

How do we balance the power of MCP with the need for security?

Servlets: Still MCP, Just Lighter and More Secure

On a first glance, mcp.run might look like an MCP Server marketplace. But mcp.run does not just provide a marketplace for arbitrary MCP servers. Instead, it provides a curated set of lightweight "servlets" that are designed to run in a secure environment.

We call them "servlets" to emphasize they don't run as full-fledged MCP servers on the user's machine. Instead, they share a host process service that runs them in an isolated environment with limited access to the host machine's resources. This is usually called a "sandbox".

Servlets do not share data with each other, and the data that is shared with the host process is limited to the specific pieces of information that are needed for the servlet to work. This means that even if one servlet is compromised, it cannot just access the data or the resources of another servlet: these will be still mediated by the permissions that the other servlet has been granted. This is a key security feature of mcp.run, and it greatly mitigates "tool poisoning" attacks.

The file-system servlet may only access one specific user-defined directory. The brave-search servlet may only access the Brave Search API.

Each servlet is given explicit access to a specific set of resources and capabilities, which are defined in their configuration. This means that they are only given access to resources and capabilities that they have explicitly declared. For example:

The filesystem servlet can access only a file system portions that it explicitly requested; it cannot read files outside the given boundaries; and it cannot access the Internet.
The brave-search servlet can access the Brave Search API, but it cannot access any other web resources.
The fetch servlet is only able to retrieve the contents of a web site.
The eval-py servlet evaluates Python code but it can neither access the Internet nor the file system.

All of the capabilities that a Servlet requires must be explicitly granted upon installation, and they are not allowed otherwise. The installation page for a servlet displays the full list of capabilities that the servlet will be granted, and the user must explicitly accept them before the servlet can be installed; in some cases, they can even be further restricted if the user decides so.

More Safeguards

It is also very common for an MCP server to read configuration parameters from a file or from environment variables; this includes sensitive information such as passwords or API keys. But mcp.run servlets are instead configured through a standardized interface, and the credentials are stored in a secure way. Thus, there is no need to store sensitive information in configuration files or environment variables. In fact, servlets cannot access environment variables in any way.

Moreover, servlets cannot access other sensitive system resources such as the clipboard; and even when the servlet is granted access to the file system, it is limited to a specific directory that is defined in the configuration. This means no servlet has unrestricted access to sensitive files or directories on the host machine, such as your ~/.cursor/mcp.json or your SSH keys, without explicit user consent.

API Integration: Servlets as a Security Boundary

A reason for MCP's success is the protocol is simple at its core, making it easy to write your own MCP server. We believe that in the future, many organizations will want to implement their own MCP servers:

you will have to write your own server when you want to plug your own APIs;
you might want to orchestrate articulate API flows and avoid filling the context window with irrelevant data;
but, most importantly, and yet often overlooked, you will need to write your own MCP server if you want to retain full-control over the content surface that will "leak" into the token stream.

Because of how LLMs currently work, tool calls require giving your AI service provider unencrypted access to sensitive data. Even when tool calls are performed locally, unless you run LLMs on-premises, a third-party service is given access to all exchanged data.

While writing an MCP server is relatively easy, creating an mcp.run servlet is even easier. Servlets run in a shared but secure host environment that implements the protocol and maintains a stable connection. You only need to implement the logic of your tools, and the host handles the remaining details.

Writing a servlet is easy and fun; it allows you to retain control over your data, and it brings also performance benefits: controlling the amount of data that is returned into the token stream, allows to make sure that the the AI service is not overwhelmed and can focus on relevant information.

You can write your servlets in multiple programming languages: TypeScript, Python, Go, Rust, C++, Zig, and you can even bring your own.

WebAssembly: A Portable Sandbox

"A sandboxed environment where you can run custom code, I know this: this is a Container!" you might think. But it is not! mcp.run servlets run on a WebAssembly runtime; in other words, they are running in a lightweight, portable virtual machine with efficient sandboxing.

This also means that they do not need to run on your local machine. You can fetch them from your other devices, such as phones or tablets, and run them even there, processing data locally and only sending the results to the AI service. You could even run them in a browser, without the need to install any software.

Finally, you can offload execution to our servers for lighter-weight processing; but, if you choose so, you can also run them on premises and process data in a secure environment you trust.

Speaking of secure environments you can trust, have you checked out mcp.run Tasks?

Conclusion

We look forward to see what you will build with mcp.run! If you want to learn more about mcp.run and how it can help bringing practical, secure and customized AI automation to your organization, get in touch!

MCP: The Differential for Modern APIs and Systems

March 27, 2025 · 14 min read

Steve Manuel

CEO, Co-founder @ Dylibso

TL;DR

MCP enables vastly more resilient system integrations by acting as a differential between APIs, absorbing changes and inconsistencies that would typically break traditional integrations. With "the prompt is the program," we can build more adaptable, intention-based integrations that focus on what needs to be done rather than how it's technically implemented.

Try it now →

The Problem with Traditional API Integration

When engineers connect software systems, they're engaging in an unspoken contract: "I will send you data in exactly this format, and you will do exactly this with it." This rigid dependency creates brittle integrations that break when either side makes changes.

Traditional API integration looks like this:

// System A rigidly calls System B with exact parameters
const response = await fetch("https://api.system-b.com/v2/widgets", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": "Bearer " + token,
  },
  body: JSON.stringify({
    name: widgetName,
    color: "#FF5733",
    dimensions: {
      height: 100,
      width: 50,
    },
    metadata: {
      created_by: userId,
      department: "engineering",
    },
  }),
});

// If System B changes ANY part of this structure, this code breaks

This approach has several fundamental problems:

Version Lock: Systems get locked into specific API versions
Brittle Dependencies: Small changes can cause catastrophic failures
High Maintenance Burden: Keeping integrations working requires constant vigilance and updates
Implementation Details Exposure: Systems need to know too much about each other

The Automobile Differential: A Mechanical Analogy

To understand how MCP addresses these issues, let's examine a mechanical engineering breakthrough: the differential gear in automobiles.

Before the differential, cars had a serious problem. When turning a corner, the outer wheel needs to travel farther than the inner wheel. With both wheels rigidly connected to the same axle, this created enormous stress, causing wheels to slip, skid, and wear out prematurely.

The differential solved this by allowing wheels on the same axle to rotate at different speeds while still delivering power to both. It absorbed the differences between what each wheel needed.

MCP: The API Differential

MCP functions as an API differential – it sits between systems and absorbs their differences, allowing them to effectively work together despite technical inconsistencies.

How does this work? Through intention-based instructions rather than implementation-specific calls.

Traditional API Calls vs. MCP Calls

Traditional API:

// Extremely specific requirements
await client.createWidget({
  name: "Customer Dashboard",
  type: "analytics",
  visibility: "team",
  access_level: 3,
  resource_id: "res_8675309",
  parent_id: "dash_12345",
  metadata: {
    created_by: "user_abc123",
    department_id: 42,
  },
});

MCP Approach:

// High-level intention
"Create a new analytics widget called 'Customer Dashboard' 
for the marketing team's main dashboard"

With the MCP approach, the client doesn't need to know all the specific parameters upfront. Instead, it engages in a discovery-based workflow:

How MCP Clients Resolve Required Parameters

Tool Discovery, Dependency Analysis, Parameter Resolution

When an MCP Client receives a high-level instruction like creating a dashboard widget, it follows a sophisticated process:

Tool Discovery: The client first queries the MCP Server for available tools:

// MCP Client requests available tools
const toolList = await listTools();

// Server returns available tools with their descriptions
[
  {
    "name": "create_widget",
    "description": "Creates a new widget on a dashboard",
    "inputSchema": {
      "type": "object",
      "required": ["name", "dashboard_id", "widget_type"],
      "properties": {
        "name": {
          "type": "string",
          "description": "Display name for the widget",
        },
        "dashboard_id": {
          "type": "string",
          "description": "ID of the dashboard to add the widget to",
        },
        "widget_type": { ... },
        "team_id": { ... },
        // Other parameters...
      },
    },
  },
  {
    "name": "list_dashboards",
    "description": "List available dashboards, optionally filtered by team",
    "inputSchema": { ... },
  },
  {
    "name": "get_user_info",
    "description": "Get information about the current user or a specified user",
    "inputSchema": { ... },
  },
  {
    "name": "get_team_info",
    "description": "Get information about a team",
    "inputSchema": { ... },
  },
];

Dependency Analysis: The LLM analyzes the instruction and the available tools, recognizing that to create a widget, it needs:
- The dashboard ID for "marketing team's main dashboard"
- Possibly the team ID for the "marketing team"
Parameter Resolution: The LLM plans and executes a sequence of dependent calls:

(This is deterministic code for demonstration, but the model infers these steps on its own!)

// First, get the marketing team's ID
const teamResponse = await callTool("get_team_info", {
  team_name: "marketing",
});
// teamResponse = { team_id: "team_mktg123", department_id: 42, ... }

// Next, find the marketing team's main dashboard
const dashboardsResponse = await callTool("list_dashboards", {
  team_name: "marketing",
});
// dashboardsResponse = [
//   { id: "dash_12345", name: "Main Dashboard", is_primary: true, ... },
//   { id: "dash_67890", name: "Campaign Performance", ... }
// ]

// Filter for the main dashboard
const mainDashboard = dashboardsResponse.find((d) => d.is_primary) ||
  dashboardsResponse.find((d) => d.name.toLowerCase().includes("main"));

// Finally, create the widget with all required parameters
const widgetResponse = await callTool("create_widget", {
  name: "Customer Dashboard",
  dashboard_id: mainDashboard.id,
  widget_type: "analytics",
  team_id: teamResponse.team_id,
});

Semantic Mapping: The MCP Server handles translating from the standardized tool parameters to the specific API requirements, which might involve:
- Translating team_id to the internal department_id format
- Setting the appropriate access_level based on team permissions
- Generating a unique resource_id
- Populating metadata based on contextual information

This approach is revolutionary because:

Resilient to Changes: If the underlying API changes (e.g., requiring new parameters or renaming fields), only the MCP Server needs to update – the high-level client instruction stays the same
Intent Preservation: The focus remains on what needs to be accomplished, not how
Progressive Enhancement: New API capabilities can be leveraged without client changes
Contextual Intelligence: The LLM can make smart decisions about which dashboard is the "main" one based on naming, flags, or other context

With MCP, the instruction doesn't change even when the underlying API changes dramatically. The MCP Server handles the translation from high-level intent to specific API requirements.

Why This Matters for System Resilience

In an MCP-enabled world, system upgrades, API changes, and even complete backend replacements can happen with minimal disruption to connected systems.

Consider a company migrating from Salesforce to HubSpot. Traditionally, this would require rewriting dozens or hundreds of integrations. With MCP:

The high-level instructions ("Create a new contact for John Smith from Acme Corp") stay the same
Only the MCP Server implementation changes to target the new system
Existing systems continue functioning with minimal disruption

The Technical Magic Behind MCP's Differential Capabilities

MCP achieves this resilience through several key mechanisms:

1. Dynamic Tool Discovery

Unlike traditional SDKs where available methods are fixed at compile/build time, MCP Clients discover tools dynamically at runtime. This means:

New capabilities can be added without client updates
Deprecated features can be gracefully removed
Feature flags can be applied at the server level

2. Semantic Parameter Mapping

Semantic Parameter Mapping enables systems to communicate based on meaning rather than rigid structure. This approach drastically improves resilience and adaptability in integrations.

In traditional API integration, parameters are often implementation-specific and tightly coupled to the underlying data model. For example, a CRM API might require a customer record to be created with fields like cust_fname, cust_lname, and cust_type_id - names that reflect internal database schema rather than their semantic meaning.

With MCP's semantic parameter mapping, tools are defined with parameters that reflect their conceptual purpose, not their technical implementation:

Traditional API Parameters:

{
  "cust_fname": "Jane",
  "cust_lname": "Smith",
  "cust_type_id": 3,
  "cust_status_cd": "A",
  "addr_line1": "123 Main St",
  "addr_city": "Springfield",
  "addr_state_cd": "IL",
  "addr_zip": "62701",
  "cust_src_id": 7,
  "rep_emp_id": "EMP82736"
}

MCP Tool Description with Semantic Parameters:

{
  "name": "create_customer",
  "description": "Create a new customer record in the CRM system",
  "inputSchema": {
    "type": "object",
    "properties": {
      "firstName": {
        "type": "string",
        "description": "Customer's first or given name"
      },
      "lastName": {
        "type": "string",
        "description": "Customer's last or family name"
      },
      "customerType": {
        "type": "string",
        "enum": ["individual", "business", "government", "non-profit"],
        "description": "The category of customer being created"
      },
      "address": {
        "type": "object",
        "description": "Customer's primary address",
        "properties": {
          "street": {
            "type": "string",
            "description": "Street address including number and name"
          },
          "city": {
            "type": "string",
            "description": "City name"
          },
          "state": {
            "type": "string",
            "description": "State, province, or region"
          },
          "postalCode": {
            "type": "string",
            "description": "ZIP or postal code"
          },
          "country": {
            "type": "string",
            "description": "Country name",
            "default": "United States"
          }
        }
      },
      "source": {
        "type": "string",
        "description": "How the customer was acquired (e.g., 'website', 'referral', 'trade show')"
      },
      "assignedRepresentative": {
        "type": "string",
        "description": "Name or identifier of the sales representative assigned to this customer",
        "required": false
      }
    },
    "required": ["firstName", "lastName", "customerType"]
  }
}

The key differences in the semantic approach:

Human-readable parameter names: Using firstName instead of cust_fname makes the parameters self-descriptive
Hierarchical organization: Related parameters like address fields are nested in a logical structure
Descriptive enumerations: Instead of opaque codes (like cust_type_id: 3), semantically meaningful values like "business" are used
Clear descriptions: Each parameter includes a description of its purpose rather than just its data type
Meaningful defaults: When appropriate, semantic defaults can be provided

This semantic approach provides tremendous advantages:

Future-proofing

If the underlying CRM changes its internal codes or structure, only the mapping function in the MCP Server needs to be updated

Interoperability

Multiple different CRM systems could implement MCP Servers with the same semantic parameters, allowing seamless switching between backends

Conceptual clarity

People and AI systems can more easily understand the parameters and their purpose

Field validation

Semantic validation becomes possible (e.g., ensuring state names are valid) rather than just type checking

Extensibility

New parameters can be added to the semantic schema without breaking existing integrations

When combined with intent-based execution, semantic parameter mapping creates a powerful abstraction layer that shields systems from the implementation details of their integration partners, making the entire ecosystem more adaptable and resilient to change.

3. Intent-Based Execution

Intent-based execution is perhaps the most transformative aspect of MCP. Let me walk you through a detailed example that illustrates how this works in practice.

Imagine a scenario where a business wants to "send a quarterly performance report to all department heads." This seemingly simple task involves multiple steps and systems in a traditional integration context:

Traditional Integration Approach:

Query the HR system to identify department heads
Access the financial system to gather quarterly performance data
Generate a PDF report using a reporting engine
Connect to the email system to send personalized emails with attachments
Log the communication in the CRM system

Each of these steps would require detailed knowledge of the respective APIs, authentication methods, data formats, and error handling. If any system changes its API, the entire integration could break.

With MCP's Intent-Based Execution:

The MCP Client (like Tasks) might simply receive the instruction:

"Send our Q1 2024 performance report to all department heads. Include YoY comparisons and highlight areas exceeding targets by more than 10%."

Behind the scenes, the MCP Client would:

Recognize the high-level intent and determine this requires multiple tool calls
Query the MCP Server for available tools related to reporting, employee data, and communications
Based on the tool descriptions, construct a workflow:

(Again, not executed code, but to illustrate the inferred logic the LLM runs!)

// In step 2 listed above, the MCP Client has all the
// static identifiers and parameters from the available
// tool descriptions to be used in this code

// First, identify who the department heads are
const departmentHeads = await callTool("get_employees", {
  filters: { position_type: "department_head", status: "active" },
});

// Get financial performance data for Q1 2024
const financialData = await callTool("get_financial_report", {
  period: "q1_2024",
  metrics: ["revenue", "expenses", "profit_margin", "growth"],
  comparisons: ["year_over_year"],
});

// The LLM analyzes the data to identify high-performing areas
const highlights = financialData.metrics.filter(
  (metric) => metric.year_over_year_change > 10,
);

// Generate a report with the appropriate formatting and emphasis
const report = await callTool("create_report", {
  title: "Q1 2024 Performance Report",
  data: financialData,
  highlights: highlights,
  format: "pdf",
  template: "quarterly_executive",
});

// Send the report to each department head with a personalized message
for (const head of departmentHeads) {
  await callTool("send_email", {
    recipient: head.email,
    subject: "Q1 2024 Performance Report",
    body:
      `Dear ${head.name},\n\nPlease find attached our Q1 2024 performance report. Your department ${head.department} showed ${
        highlights.some((h) => h.department === head.department)
          ? "exceptional performance in some areas"
          : "consistent results"
      }.\n\nRegards,\nExecutive Team`,
    attachments: [report.file_id],
    log_to_crm: true,
  });
}

The crucial difference is that the MCP Server for each system is responsible for translating these semantic, intent-based calls into whatever specific API calls its system requires.

For example, the HR system's MCP Server might translate get_employees with a position or role filter into a complex SQL query or LDAP search, while the reporting system's MCP Server might convert create_report into a series of API calls to a business intelligence platform.

If any of these backend systems change:

The HR system might switch from an on-premise solution to Workday
The financial system might upgrade to a new version with a completely different API
The reporting engine might be replaced with a different vendor
The email system might move from Exchange to Gmail

None of these changes would affect the high-level intent-based instruction. Only the corresponding MCP Servers would need to be updated to translate the same semantic calls into the new underlying system's language.

This is the true power of intent-based execution with MCP - it decouples what you want to accomplish from the technical details of how to accomplish it, creating resilient integrations that can withstand significant changes in the underlying technology landscape.

Building Resilient Systems with MCP

To leverage MCP as a differential in your own systems:

Focus on intent over implementation: Design your tools around what they accomplish, not how
Embrace semantic parameters: Name and structure parameters based on their meaning, not your current implementation
Build for adaptation: Assume underlying APIs will change and design your MCP Servers to absorb these changes

The Future of System Integration

As we move toward a world where "the prompt is the program," traditional rigid API contracts will increasingly be replaced by intent-based interfaces. MCP provides a standardized protocol for this transition.

The implications are profound:

Reduced integration maintenance: Systems connected via MCP require less ongoing maintenance
Faster adoption of new technologies: Backend systems can be replaced without disrupting front-end experiences
Greater composability: Systems can be combined in ways their original designers never anticipated
Longer component lifespan: Software components can remain useful far longer despite ecosystem changes

The differential revolutionized transportation by solving a mechanical impedance mismatch. MCP is poised to do the same for software integration by solving the API impedance mismatch that has plagued systems for decades.

The future of integration isn't more rigid contracts – it's more flexible, intent-based communication between systems that can adapt as technology evolves.

Get Started with MCP

Ready to build more resilient integrations? Here's how to start:

Explore the MCP specification at mcp.run
Install mcpx, or generate an SSE URL from your Profile to start using MCP tools immediately in any MCP Client
Consider which of your systems would benefit from an MCP interface
Join our community to learn MCP best practices

Introducing mcpx-eval: An open-ended tool use evaluation framework

March 3, 2025 · 4 min read

Zach Shipko

Chief Researcher, Co-founder @ Dylibso

The AI landscape is evolving rapidly, and with almost daily advancements, it can be hard to understand which large language model (LLM) makes the most sense for the task at hand. To make things more complicated, each new model release comes with new claims regarding capabilities that don't necessarily translate to all workloads. In some cases, it will make sense to stick with an older model to save on time or cost, but for others, you might need the cutting-edge reasoning abilities of a newer model. The only way to really understand how an LLM will perform a specific task is to try it out (usually a bunch of times). This is why we're happy to announce mcpx-eval, our new eval framework to compare how different models perform with mcp.run tools!

The Berkeley Function Calling Leaderboard (BFCL) is the state-of-the-art for measuring how well various models perform specific tool calls. BFCL includes many specific tools created for particular tests being performed, which allows for tool use to be tracked closely. While this is extremely valuable and we may still try to get mcp.run tools working in BFCL, we decided it would be worth building an mcp.run-focused eval that could be used by us and our customers to compare the results of open-ended tasks.

This means instead of looking at very specific outcomes around tool use, we're primarily interested in trying to quantify the LLM output to determine how well they perform in situations like Tasks, where many tools can be combined to create varied results. To do this, we're using a custom set of LLM-based metrics that are scored by a secondary "judge" LLM. Using static reference-based metrics can provide more deterministic testing of prompts but are less helpful when it comes to interpreting the "appropriateness" of a tool call or the "completeness" of a response.

To peek behind the curtain a little, we can see how the judge prompt is structured:

<settings>
Max tool calls: {max_tool_calls}
Current date and time: {datetime.now().isoformat()}
</settings>
<prompt>{prompt}</prompt>
<output>{output}</output>
<check>{check}</check>
<expected-tools>{', '.join(expected_tools)}</expected-tools>

There are several sections present: <settings>, <prompt>, <output>, <check> and <expected-tools>

settings provides additional context for an evaluation
prompt contains the original test prompt so the judge can assess the completeness of the output
output contains all the messages from the test prompt, including any tools that were used
check contains user-provided criteria that can be used to judge the output
expected-tools is a list of tools that can be expected to be used for the given prompt

Using this template, the output of each LLM under test along with the check criteria is converted into a new prompt and analyzed by the judge for various metrics which we are able to extract into structured scoring data using PydanticAI. The result of the judge is a Python object with fields that hold the numeric scores, allowing us to analyze the data using a typical data processing library like Pandas.

This is just scratching the surface - our preliminary results already reveal interesting patterns in how different models handle complex tool use scenarios. For example, one of the biggest problems we're seeing is over-use, which doesn't affect the quality of the output but does affect compute time and cost. These measurements can help us provide the information to give teams more confidence when selecting models for use with mcp.run tools.

Looking forward, we're planning to expand our test suite to include more models and cover more domains. Please reach out if you have a specific use case you'd like to see evaluated or are wondering how a particular model performs for your task. The mcpx-eval code is also available on Github.

MCP March Madness

March 3, 2025 · 27 min read

Steve Manuel

CEO, Co-founder @ Dylibso

MCP March Madness

Announcing the first MCP March Madness Tournament!

This month, we're hosting a face-off like you've never seen before. If "AI Athletes" wasn't on your 2025 bingo card, don't worry... it's not quite like that.

We're putting the best of today's API-driven companies head-to-head in a matchup to see how their MCP servers perform when tool-equipped Large Language Models (LLMs) are tasked with a series of challenges.

How It Works

Each week in March, we will showcase two competing companies in a test to see how well AI can use their product through API interactions.

This will involve a Task that describes specific work requiring the use of the product via its API.

For example:

"Create a document titled 'World Populations' and add a table containing the world's largest countries including their name, populations, and ranked by population size. Provide the URL to this new document in your response."

In this example matchup, we'd run this exact same Task twice - first using Notion tools, and in another run, Google Docs. We'll compare the outputs, side-effects, and the time spent executing. Using models from Anthropic and OpenAI, we'll record each run so you can verify our findings!

We're keeping these tasks fairly simple to test the model's accuracy when calling a small set of tools. But don't worry - things will get spicier in the Grand Finale!

Additionally, we're releasing our first evaluation framework for MCP-based tool calling, which we'll use to run these tests. We really want to exercise the tools and prompts as fairly as possible, and we'll publish all evaluation data for complete transparency.

Selecting the MCPs

As a prerequisite, we're using MCPs available on our public registry at www.mcp.run. This means they may not have been created directly by the API providers themselves. Anyone is able to publish MCP servlets (WebAssembly-based, secure & portable MCP Servers) to the registry. However, to ensure these matchups are as fair as possible, we're using servlets that have been generated from the official OpenAPI specifications provided by each company.

MCP Servers [...] generated from the official OpenAPI specification provided by each company.

Wait, did I read that right?

Yes, you read that right! We'll be making this generator available later this month... so sign up and follow-along for that announcement.

So, this isn't just a test of how well AI can use an API - it's also a test of how comprehensive and well-designed a platform's API specification is!

NEW: Our Tool Use Eval Framework

As mentioned above, we're excited to share our new evaluation framework for LLM tool calling!

mcpx-eval is a framework for evaluating LLM tool calling using mcp.run tools. The primary focus is to compare the results of open-ended prompts, such as mcp.run Tasks. We're thrilled to provide this resource to help users make better-informed decisions when selecting LLMs to pair with mcp.run tools.

If you're interested in this kind of technology, check out the repository on GitHub, and read our full announcement for an in-depth look at the framework.

The Grand Finale

After 3 rounds of 1-vs-1 MCP face-offs, we're taking things up a notch. We'll put the winning MCP Servers on one team and the challengers on another. We'll create a new Task with more sophisticated work that requires using all 3 platforms' APIs to complete a complex challenge.

May the best ~~man~~ MCP win!

2025 Schedule

Round 1: Supabase vs. Neon

📅 March 5, 2025

Round 1: Supabase vs. Neon

We put two of the best Postgres platforms head-to-head to kick-off MCP March Madness! Watch the matchup in realtime, executing a Task to ensure a project is ready to use and to generate and execute the SQL needed to set up a database to manage our NewsletterOS application.

Here's the prompt:

Pre-requisite:
- a database and or project to use inside my account
  - name: newsletterOS
  
Create the tables necessary to act as the primary transactional database for a Newsletter Management System, where its many publishers manage the creation of newsletters and the subscribers to each newsletter.

I expect to be able to work with tables of information including data on: 
- publisher
- subscribers
- newsletters 
- subscriptions (mapping subscribers to newsletters)
- newsletter_release (contents of newsletter, etc)
- activity (maps publisher to a enum of activity types & JSON) 

Execute the necessary queries to set my database up with this schema.

In each Task, we attach the Supabase and Neon mcp.run servlets to the prompt, giving our Task access to manage those respective accounts on our behalf via their APIs.

See how Supabase handles our Task as Claude Sonnet 3.5 uses MCP server tools:

Next, see how Neon handles the same Task, leveraging Claude Sonnet 3.5 and the Neon MCP server tools we generated from their OpenAPI spec.

The results are in... and the winner is...

🏆 Supabase 🏆

Summary

Unfortunately, Neon was unable to complete the Task as-is, using only its functionality exposed via their official OpenAPI spec. But, they can (and hopefully will!) make it so an OpenAPI consumer can run SQL this way. As noted in the video, their hand-writen MCP Server does support this. We'd love to make this as feature-rich on mcp.run so any Agent or AI App in any language or framework (even running on mobile devices!) can work as seamlessly.

Eval Results

In addition to the Tasks we ran, we also executed this prompt with our own eval framework mcpx-eval. We configure this eval using the following, and when it runs we provide the profile where the framework can load and call the right tools:

name = "neon-vs-supabase"
max-tool-calls = 100

prompt = """
Pre-requisite:
- a database and or project to use inside my account
  - name: newsletterOS
  
Create the tables and any seed data necessary to act as the primary transactional database for a Newsletter Management System, where its many publishers manage the creation of newsletters and the subscribers to each newsletter.

Using tools, please create tables for: 
- publisher
- subscribers
- newsletters 
- subscriptions (mapping subscribers to newsletters)
- newsletter_release (contents of newsletter, etc)
- activity (maps publisher to a enum of activity types & JSON) 


Execute the necessary commands to set my database up for this. 

When all the tables are created, output the queries to describe the database.
"""

check="""
Use tools and the output of the LLM to check that the tables described in the <prompt> have been created.

When selecting tools you should never in any case use the search tool.
"""

ignore-tools = [
  "v1_create_a_sso_provider",
  "v1_update_a_sso_provider",
]

Supabase vs. Neon (with OpenAI GPT-4o)

Supabase mcpx-eval: OpenAI GPT-4o

(click to enlarge)

Neon (left) outperforms Supabase (right) on the accuracy dimension by a few points - likely due to better OpenAPI spec descriptions, and potentially more specific endpoints. These materialize as tool calls and tool descriptions, which are provided as context to the inference run, and make a big difference.

In all, both platforms did great, and if Neon adds query execution support via OpenAPI, we'd be very excited to put it to use.

Next up!

Stick around for another match-up next week... details below ⬇️

Round 2: Resend vs. Loops

📅 March 12, 2025

Round 2: Resend vs. Loops

Everybody loves email, right? Today we're comparing some popular email platforms to see how well their APIs are designed for AI usage. Can an Agent or AI app successfully carry out our task? Let's see!

Here's the prompt:

Unless it already exists, create a new audience for my new newsletter called: "{{ audience }}"
 
Once it is created, add a test contact: "{{ name }} {{ email }}".

Then, send that contact a test email with some well-designed email-optimized HTML that you generate. Make the content and the design/theme relevant based on the name of the newsletter for which you created the audience.

Notice how we have parameterized this prompt with replacement parameters! This allows mcp.run Tasks to be dynamically updated with values - especially helpful when triggering them from an HTTP call or Webhook.

In each Task, we attach the Resend and Loops mcp.run servlets to the prompt, giving our Task access to manage those respective accounts on our behalf via their APIs.

See how Resend handles our Task using its MCP server tools:

Next, see how Loops handles the same Task, leveraging the Loops MCP server tools we generated from their OpenAPI spec.

The results are in... and the winner is...

🏆 Resend 🏆

Summary

Similar to Neon in Round 1, Loops was unable to complete the Task as-is, using only its functionality exposed via their official OpenAPI spec. Hopefully they add the missing API surface area to enable an AI application or Agent to send transactional email along with a new template on the fly.

Resend was clearly designed to be extremely flexible, and the model was able to figure out exactly what it needed to do in order to perfectly complete our Task.

Eval Results

name = "loops-vs-resend"

prompt = """
Unless it already exists, create a new audience for my new newsletter called: "cat-facts"
Once it is created, add a test contact: "Zach [email protected]"

Then, send that contact a test email with some well-designed email-optimized HTML that you generate. Make the content and the design/theme relevant based on the name of the newsletter for which you created the audience.
"""

check="""
Use tools to check that the audience exists and that the email was sent correctly
"""

Resend vs. Loops (with OpenAI GPT-4o)

Supabase mcpx-eval: Claude Sonnet 3.7

(click to enlarge)

Resend (left) outperforms Loops (right) accross the board. In part due to Loops missing functionality to complete the task, but also likely that Resend's OpenAPI spec is extremely comprehensive and includes very rich descriptions and detail.

Remember, all of this makes its way into the context of the inference request, and influences how the model decides to respond with a tool request. The better your descriptions, the more accurately the model will use your tool!

Next up!

Stick around for another match-up next week... details below ⬇️

Round 3: Perplexity vs. Brave Search

📅 March 19, 2025

Round 3: Perplexity vs. Brave Search

If you're looking for something like DeepResearch, without the "PhD-level reasoning" or the price tag that goes along with it, then this is the round for you!

Perplexity is a household name, and packs a punch for sourcing relevant and recent information on any subject. Through its Sonar API, we can programmatically make our way through the web. Brave exposes its powerful, more traditional search engine via API. Which one can deliver the best results for us when asked to find recent news and information about a given topic?

Here's the prompt:

We need to find the latest, most interesting and important news for people who have subscribed to our "{{ topic }}" newsletter. 

To do this, search the web for news and information about {{ topic }}, do many searches for newly encountered & highly related terms, associated people, and other related insights that would be interesting to our subscribers.

These subscribers are very aware of {{ topic }} space and what is happening, so when we find a good source on the web, also add some intellegent & researched prose around the article or content. Limit this to just a sentence or two, and include it in the output you provide. 

Output all of the links you find on the web and your expert additional prose in a Markdown format so it can be read and approved by a 3rd party.

Notice how we have parameterized this prompt with replacement parameters! This allows mcp.run Tasks to be dynamically updated with values - especially helpful when triggering them from an HTTP call or Webhook.

In each Task, we attach the Perplexity and Brave Search mcp.run servlets to the prompt, giving our Task access to manage those respective accounts on our behalf via their APIs.

This round, we've combined the Task runs into a single video. Check them out:

Content Output

Here's the full, rendered output from each of the Tasks run in the video. What do you think, which did a better job finding us results for "AI Agent" news to include in a newsletter?

Perplexity
Brave Search

NVIDIA's Game-Changing AI Agent Infrastructure

NVIDIA AI-Q Blueprint and AgentIQ Toolkit

NVIDIA unveiled AI-Q, a comprehensive Blueprint for developing agentic systems that's reshaping how enterprises build AI agents. The framework integrates NVIDIA's accelerated computing with partner storage platforms and software tools.

The AI-Q Blueprint represents NVIDIA's strategic move to dominate the enterprise AI agent infrastructure market, positioning them as the essential foundation for companies building sophisticated agent systems.

Llama Nemotron Model Family

NVIDIA launched the Llama Nemotron family of open reasoning AI models designed specifically for agent development. Available in three sizes (Nano: 8B, Super: 49B, and Ultra: 253B parameters), these models offer advanced reasoning capabilities with up to 20% improved accuracy over base Llama models.

These models are particularly significant as they offer hybrid reasoning capabilities that let developers toggle reasoning on/off to optimize token usage and costs—a critical feature for enterprise deployment that could accelerate adoption.

Enterprise AI Agent Adoption

Industry Implementation Examples

Yum Brands is deploying voice ordering AI agents in restaurants, with plans to roll out to 500 locations this year.
Visa is using AI agents to streamline cybersecurity operations and automate phishing email analysis.
Rolls-Royce has implemented AI agents to assist service desk workers and streamline operations.

While these implementations show promising use cases, the ROI metrics remain mixed—only about a third of C-suite leaders report substantial ROI in areas like employee productivity (36%) and cost reduction, suggesting we're still in early stages of effective deployment.

Zoom's AI Companion Enhancements

Zoom introduced new agentic AI capabilities for its AI Companion, including calendar management, clip generation, and advanced document creation. A custom AI Companion add-on is launching in April at $12/user/month.

Zoom's approach of integrating AI agents directly into existing workflows rather than as standalone tools could be the key to avoiding the "productivity leak" problem, where 72% of time saved by AI doesn't convert to additional throughput.

Developer Tools and Frameworks

OpenAI Agents SDK

OpenAI released a new set of tools specifically designed for building AI agents, including a new Responses API that combines chat capabilities with tool use, built-in tools for web search, file search, and computer use, and an open-source Agents SDK for orchestrating single-agent and multi-agent workflows.

This release significantly lowers the barrier to entry for developers building sophisticated agent systems and could accelerate the proliferation of specialized AI agents across industries.

Eclipse Foundation Theia AI

The Eclipse Foundation announced two new open-source AI development tools: Theia AI (an open framework for integrating LLMs into custom tools and IDEs) and an AI-powered Theia IDE built on Theia AI.

As an open-source alternative to proprietary development environments, Theia AI could become the foundation for a new generation of community-driven AI agent development tools.

Research Breakthroughs

Multi-Agent Systems

Recent research has focused on improving inter-agent communication and cooperation, particularly in autonomous driving systems using LLMs. The development of scalable multi-agent frameworks like Nexus aims to make MAS development more accessible and efficient.

The shift toward multi-agent systems represents a fundamental evolution in AI agent architecture, moving from single-purpose tools to collaborative systems that can tackle complex, multi-step problems.

SYMBIOSIS Framework

Cabrera et al. (2025) introduced the SYMBIOSIS framework, which combines systems thinking with AI to bridge epistemic gaps and enable AI systems to reason about complex adaptive systems in socio-technical contexts.

This framework addresses one of the most significant limitations of current AI agents—their inability to understand and navigate complex social systems—and could lead to more contextually aware and socially intelligent agents.

Ethical and Regulatory Developments

EU AI Act Implementation

The EU AI Act, expected to be fully implemented by 2025, introduces a risk-based approach to regulating AI with stricter requirements for high-risk applications, including mandatory risk assessments, human oversight mechanisms, and transparency requirements.

As the first comprehensive AI regulation globally, the EU AI Act will likely set the standard for AI agent governance worldwide, potentially creating compliance challenges for companies operating across borders.

Industry Standards Emerging

Organizations like ISO/IEC JTC 1/SC 42 and the NIST AI Risk Management Framework are developing guidelines for AI governance, including specific considerations for autonomous agents.

These standards will be crucial for establishing common practices around AI agent development and deployment, potentially reducing fragmentation in approaches to AI safety and ethics.

This newsletter provides a snapshot of the rapidly evolving AI Agent landscape. As always, we welcome your feedback and suggestions for future topics.

The Rise of Autonomous AI Agents in 2025

IBM Predicts the Year of the Agent

According to IBM's recent analysis, 2025 is shaping up to be "the year of the AI agent." A survey conducted with Morning Consult revealed that 99% of developers building AI applications for enterprise are exploring or developing AI agents. This shift from passive AI assistants to autonomous agents represents a fundamental evolution in how AI systems operate and interact with the world.

IBM's AI Agents 2025: Expectations vs. Reality

IBM's research highlights a crucial distinction between today's function-calling models and truly autonomous agents. While many companies are rushing to adopt agent technology, IBM cautions that most organizations aren't yet "agent-ready" - the real challenge lies in exposing enterprise APIs for agent integration.

China's Manus: A Revolutionary Autonomous Agent

In a significant development, Chinese researchers launched Manus, described as "the world's first fully autonomous AI agent." Manus uses a multi-agent architecture where a central "executor" agent coordinates with specialized sub-agents to break down and complete complex tasks without human intervention.

Forbes: China's Autonomous Agent, Manus, Changes Everything

Manus represents a paradigm shift in AI development - not just another model but a truly autonomous system capable of independent thought and action. Its ability to navigate the real world "as seamlessly as a human intern with an unlimited attention span" signals a new era where AI systems don't just assist humans but can potentially replace them in certain roles.

MIT Technology Review's Hands-On Test of Manus

MIT Technology Review recently tested Manus on several real-world tasks and found it capable of breaking tasks down into steps and autonomously navigating the web to gather information and complete assignments.

MIT Technology Review: Everyone in AI is talking about Manus. We put it to the test.

While experiencing some system crashes and server overload, MIT Technology Review found Manus to be highly intuitive with real promise. What sets it apart is the "Manus's Computer" window, allowing users to observe what the agent is doing and intervene when needed - a crucial feature for maintaining appropriate human oversight.

Multi-Agent Systems: The Power of Collaboration

The Evolution of Multi-Agent Architectures

Research into multi-agent systems (MAS) is accelerating, with multiple AI agents collaborating to achieve common goals. According to a comprehensive analysis by data scientist Sahin Ahmed, these systems are becoming increasingly sophisticated in their ability to coordinate and solve complex problems.

Medium: AI Agents in 2025: A Comprehensive Review

The multi-agent approach is proving particularly effective in scientific research, where specialized agents handle different aspects of the research lifecycle - from literature analysis and hypothesis generation to experimental design and results interpretation. This collaborative model mirrors effective human teams and is showing promising results in fields like chemistry.

The Oscillation Between Single and Multi-Agent Systems

IBM researchers predict an interesting oscillation in agent architecture development. As individual agents become more capable, there may be a shift from orchestrated workflows to single-agent systems, followed by a return to multi-agent collaboration as tasks grow more complex.

IBM's Analysis of Agent Architecture Evolution

This back-and-forth evolution reflects the natural tension between simplicity and specialization. While a single powerful agent might handle many tasks, complex problems often benefit from multiple specialized agents working in concert - mirroring how human organizations structure themselves.

Development Platforms and Tools for AI Agents

OpenAI's New Agent Development Tools

OpenAI recently released new tools designed to help developers and enterprises build AI agents using the company's models and frameworks. The Responses API enables businesses to develop custom AI agents that can perform web searches, scan company files, and navigate websites.

TechCrunch: OpenAI launches new tools to help businesses build AI agents

OpenAI's shift from flashy agent demos to practical development tools signals their commitment to making 2025 the year AI agents enter the workforce, as proclaimed by CEO Sam Altman. These tools aim to bridge the gap between impressive demonstrations and real-world applications.

Top Frameworks for Building AI Agents

Analytics Vidhya has identified seven leading frameworks for building AI agents in 2025, highlighting their key components: agent architecture, environment interfaces, integration tools, and monitoring capabilities.

Analytics Vidhya: Top 7 Frameworks for Building AI Agents in 2025

These frameworks provide standardized approaches to common challenges in AI agent development, allowing developers to focus on the unique aspects of their applications rather than reinventing fundamental components. The ability to create "crews" of AI agents with specific roles is particularly valuable for tackling multifaceted problems.

Agentforce 2.0: Enterprise-Ready Agent Platform

Agentforce 2.0, scheduled for full release in February 2025, offers businesses customizable agent templates for roles like Service Agent, Sales Rep, and Personal Shopper. The platform's advanced reasoning engine enhances agents' problem-solving capabilities.

Medium: A List of AI Agents Set to Dominate in 2025

Agentforce's approach of providing ready-made templates while allowing extensive customization strikes a balance between accessibility and flexibility. This platform exemplifies how agent technology is being packaged for enterprise adoption with minimal technical barriers.

Ethical Considerations and Governance

The Ethics Debate Sparked by Manus

The launch of Manus has intensified debates about AI ethics, security, and oversight. Margaret Mitchell, Hugging Face Chief Ethics Scientist, has called for stronger regulatory action and "sandboxed" environments to ensure agent systems remain secure.

Forbes: AI Agent Manus Sparks Debate On Ethics, Security And Oversight

Mitchell's research asserts that completely autonomous AI agents should be approached with caution due to potential security vulnerabilities, diminished human oversight, and susceptibility to manipulation. As AI's capabilities grow, so does the necessity of aligning it with human ethics and establishing appropriate governance frameworks.

AI Governance Trends for 2025

Following the Paris AI Action Summit, several key governance trends have emerged for 2025, including stricter AI regulations, enhanced transparency requirements, and more robust risk management frameworks.

GDPR Local: Top 5 AI Governance Trends for 2025

The summit emphasized "Trust as a Cornerstone" for sustainable AI development. Interestingly, AI is increasingly being used to govern itself, with automated compliance tools monitoring AI models, verifying regulatory alignment, and detecting risks in real-time becoming standard practice.

Market Projections and Industry Impact

The Economic Impact of AI Agents

Deloitte predicts that 25% of enterprises using Generative AI will deploy AI agents by 2025, doubling to 50% by 2027. This rapid adoption is expected to create significant economic value across multiple sectors.

WriteSonic: Top AI Agent Trends for 2025

By 2025, AI systems are expected to evolve into collaborative networks that mirror effective human teams. In business contexts, specialized AI agents will work in coordination - analyzing market trends, optimizing product development, and managing customer relationships simultaneously.

Forbes Identifies Key AI Trends for 2025

Forbes has identified five major AI trends for 2025, with autonomous AI agents featuring prominently alongside open-source models, multi-modal capabilities, and cost-efficient automation.

Forbes: The 5 AI Trends In 2025: Agents, Open-Source, And Multi-Model

The AI landscape in 2025 is evolving beyond large language models to encompass smarter, cheaper, and more specialized solutions that can process multiple data types and act autonomously. This shift represents a maturation of the AI industry toward more practical and integrated applications.

Conclusion: The Path Forward

As we navigate the rapidly evolving landscape of AI agents in 2025, the balance between innovation and responsibility remains crucial. While autonomous agents offer unprecedented capabilities for automation and problem-solving, they also raise important questions about oversight, ethics, and human-AI collaboration.

The most successful implementations will likely be those that thoughtfully integrate agent technology into existing workflows, maintain appropriate human supervision, and adhere to robust governance frameworks. As IBM's researchers noted, the challenge isn't just developing more capable agents but making organizations "agent-ready" through appropriate API exposure and governance structures.

For businesses and developers in this space, staying informed about both technological advancements and evolving regulatory frameworks will be essential for responsible innovation. The year 2025 may indeed be "the year of the agent," but how we collectively shape this technology will determine its lasting impact.

This is a close one...

As noted in the recording, both Perplexity and Brave Search servlets did a great job. It's difficult to say who wins off vibes alone... so let's use some 🧪 science! Leveraging mcpx-eval to help us decide removes the subjective component of declaring a winner.

Eval Results

Here's the configuration for the eval we ran:

name = "perplexity-vs-brave"

prompt = """
We need to find the latest, most interesting and important news for people who have subscribed to our AI newsletter. 

To do this, search the web for news and information about AI, do many searches for newly encountered & highly related terms, associated people, and other related insights that would be interesting to our subscribers.

These subscribers are very aware of AI space and what is happening, so when we find a good source on the web, also add some intellegent & researched prose around the article or content. Limit this to just a sentence or two, and include it in the output you provide. 

Output all of the links you find on the web and your expert additional prose in a Markdown format so it can be read and approved by a 3rd party. 

Only use tools available to you, do not use the mcp.run search tool
"""

check="""
Searches should be performed to collect information about AI, the result should be a well formatted and easily understood markdown document
"""

expected-tools = [
  "brave-web-search",
  "brave-image-search",
  "perplexity-chat"
]

Perplexity vs. Brave (with Claude Sonnet 3.7)

Perplexity vs. Brave mcpx-eval: Claude Sonnet 3.7

(click to enlarge)

Perplexity (left) outperforms Brave (right) on practically all dimensions, except that it does end up hallucinating every once in a while. This is a hard one to judge, but if we only look at the data, the results are in Perplexity's favor. We want to highlight that Brave did an incredible job here though, and for most search tasks, we would highly recommend it.

The results are in... and the winner is...

🏆 Perplexity 🏆

Next up!

We're on the road to the Grand Finale...

Originally, the Grand Finale was going to combine the top 3 MCP servlets and put them against the bottom 3. However, since Neon and Loops we unable to complete their tasks, we figured we'd do something a little bit more interesting.

Tune in next week to see a "headless application" at work. Combining Supabase, Resend and Perplexity MCPs to collectively carry out a sophisticated task.

Can we really put AI to work?

We'll find out next week!

...details below ⬇️

Grand Finale: 3 vs. 3 Showdown

📅 March 26, 2025

Without further ado, let's get right into it! The grand finale combines Supabase, Resend, and Perplexity into a mega MCP Task, that effectively produces an entire application that runs a newsletter management system, "newsletterOS".

See what we're able to accomplish with just a single prompt, using our powerful automation platform and the epic MCP tools attached:

Here's the prompt we used to run this Task:

Prerequisites: 
- "newsletterOS" project and database 

Create a new newsletter called {{ topic }} in my database. Also create an audience in Resend for  {{ topic }}, and add a test subscriber contact to this audience: 

name: Joe Smith 
email: [email protected]

In the database, add this same contact as a subscriber to the {{ topic }} newsletter. 

Now, find the latest, most interesting and important news for people who have subscribed to our {{ topic }} newsletter.

To do this, search the web for news and information about {{ topic }}, do many searches for newly encountered & highly related terms, associated people, and other related insights that would be interesting to our subscribers. Use 3 - 5 news items to include in the newsletter.

These subscribers are very aware of {{ topic }} space and what is happening, so when we find a good source on the web, also add some intellegent & researched prose around the article or content. Limit this to just a sentence or two, and include it in the output you provide. 

Convert your output to email-optimized HTML so it can be rendered in common email clients. Then store this HTML into the newsletter release in the database. Send a test email from "[email protected]" using the same contents to our test subscriber contact for verification.

Keep in touch

We'll be putting out more content and demos, so please follow along for more announcements:

Follow @dylibso on X

Subscribe to @dylibso on Youtube

Thanks!

How to Follow Along

We'll publish a new post on this blog for each round and update this page with the results. To stay updated on all announcements, follow @dylibso on X.

We'll record and upload all matchups to our YouTube channel, so subscribe to watch the competitions as they go live.

Want to Participate?

Interested in how your own API might perform? Or curious about running similar evaluations on your own tools? Contact us to learn more about our evaluation framework and how you can get involved!

It should be painfully obvious, but we are in no way affiliated with the NCAA. Good luck to the college athletes in their completely separate, unrelated tournament this month!

What is an AI Runtime?

February 15, 2025 · 6 min read

Steve Manuel

CEO, Co-founder @ Dylibso

Understanding AI runtimes is easier if we first understand traditional programming runtimes. Let's take a quick tour through what a typical programming language runtime is comprised of.

Traditional Runtimes: More Than Just Execution

Think about Node.js. When you write JavaScript code, you're not just writing pure computation - you're usually building something that needs to interact with the real world. Node.js provides this bridge between your code and the system through its runtime environment.

// Node.js example
const http = require("http");
const fs = require("fs");

// Your code can now talk to the network and filesystem
const server = http.createServer((req, res) => {
  fs.readFile("index.html", (err, data) => {
    res.end(data);
  });
});

The magic here isn't in the JavaScript language itself - it's in the runtime's standard library. Node.js provides modules like http, fs, crypto, and process that let your code interact with the outside world. Without these, JavaScript would be limited to pure computation like math and string manipulation.

A standard library is what makes a programming language practically useful. Node.js is not powerful just because of its syntax - it's powerful because of its libraries.

Enter the World of LLMs: Pure Computation Needs Tools

Now, let's map this to Large Language Models (LLMs). An LLM by itself is like JavaScript without Node.js, or Python without its standard library. It can do amazing things with text and reasoning, but it can't:

Read files
Make network requests
Access databases
Perform calculations with guaranteed accuracy
Interact with APIs

This is where AI runtimes come in.

The Missing Link

An AI runtime serves a similar purpose to Node.js or the Python interpreter, but instead of executing code, it:

Takes prompts as its "program"
Provides tools as its "standard library"
Handles the complexity of:
- Tool discovery and linking
- Context management
- Memory handling
- Tool output parsing and injection

Here's a conceptual example:

"Analyze the sales data from our database"

(Assume we are using a tool to connect to Supabase or Neon or similar services)

The runtime needs to:

Parse the prompt
Understand which tools are needed
Link those tools to the LLM's context
Execute the prompt
Handle tool calls and responses
Manage the entire conversation flow

The Linking Problem

Just as a C++ compiler needs to link object files and shared libraries, an AI runtime needs to solve a similar problem: how to connect LLM outputs to tool inputs, and tool outputs back to the LLM's context.

This involves:

Function calling conventions (how does the LLM know how to use a tool?)
Input/output parsing
Error handling
Context management
Memory limitations

Why This Matters

The rise of AI runtimes represents a pivotal shift in how we interact with AI technology. While the concept that "The Prompt is the Program" is powerful, the current landscape of AI development tools presents a significant barrier to entry. Let's break this down:

The Current State: Developer-Centric Tools

Most existing AI infrastructure tools like LangChain, LlamaIndex, and similar frameworks are built primarily for software engineers. They require:

Python or JavaScript programming expertise
Understanding of software architecture
Knowledge of API integrations
Ability to manage development environments
Experience with version control and deployment

While these tools are powerful, they effectively lock out vast segments of potential users who could benefit from AI automation.

Democratizing AI: Beyond Engineering

The real promise of AI runtimes lies in their potential to democratize AI tool usage across organizations. Consider these roles:

Business Process Operations (BPO)
- Automating document processing
- Streamlining customer service workflows
- Managing data entry and validation
Legal Teams
- Contract analysis automation
- Compliance checking
- Document review and summarization
Human Resources
- Resume screening and categorization
- Employee onboarding automation
- Policy document analysis
Finance Departments
- Automated report generation
- Transaction categorization
- Audit trail analysis
Marketing Teams
- Content generation and optimization
- Market research analysis
- Campaign performance reporting

The Next Evolution: Universal AI Runtimes

This is where platforms such as mcp.run's Tasks are breaking new ground. By providing a runtime environment that executes prompts and tools without requiring coding expertise, it makes AI integration accessible to everyone. Key advantages include:

Natural Language Interface
- Users can create automation using plain English prompts
- No programming required
- Intuitive tool selection and configuration
Flexible Triggering
- Manual execution through user interface
- Webhook-based automation
- Scheduled runs for recurring tasks
Enterprise Integration
- Connection to existing business tools
- Secure data handling
- Scalable execution

Real-World Applications

Consider these practical examples:

# Marketing Analysis Task
"Every Monday at 9 AM, analyze our social media metrics,
compare them to last week's performance, and send a 
summary to the #marketing channel"

Equipped with a "marketing" profile containing Sprout Social and Slack tools installed, the runtime knows exactly when to execute these tool's functions, what inputs to pass, and understands how to use their outputs to carry out the task at hand.

# Sales Lead Router
"When a new contact submits our web form, analyze their company's website for deal sizing, and assign them to a rep based on this mapping: 

small business: Zach S.
mid-market: Ben E.
enterprise: Steve M.

Then send a summary of the lead and the assignment to our #sales channel.

Similarly, equipped with a "sales" profile containing web search and Slack tools installed, this prompt would automatically use the right tools at the right time.

The Future of Work

This democratization of AI tools through universal runtimes is reshaping how organizations operate. When "The Prompt is the Program," everyone becomes capable of creating sophisticated automation workflows. This leads to:

Reduced technical barriers
Faster implementation of AI solutions
More efficient resource utilization
Increased innovation across departments
Better cross-functional collaboration

The true power of AI runtimes isn't just in executing prompts and linking tools - it's in making these capabilities accessible to everyone who can benefit from them, regardless of their technical background.

The Future

Along with the AI runtime, we're already seeing progress on many related fronts:

Standardization of tool interfaces
Rich ecosystems of pre-built tools
Best practices for runtime architecture
Performance optimizations
Security considerations

Just as the JavaScript ecosystem exploded with Node.js, we're at the beginning of a similar revolution in AI tooling and infrastructure.

If this is interesting, check out our own AI runtime, Tasks.

Tasks: the AI Runtime for Prompts + Tools

February 13, 2025 · 3 min read

Steve Manuel

CEO, Co-founder @ Dylibso

Shortly after Anthropic launched the Model Context Protocol (MCP), we released mcp.run - a managed platform that makes it simple to host and install secure MCP Servers. The platform quickly gained traction, winning Anthropic's San Francisco MCP Hackathon and facilitating millions of tool downloads globally.

Since then, MCP has expanded well beyond Claude Desktop, finding its way into products like Sourcegraph, Cursor, Cline, Goose, and many others. While these implementations have proven valuable for developers, we wanted to make MCP accessible to everyone.

We asked ourselves: "How can we make MCP useful for all users, regardless of their technical background?"

Introducing Tasks

TL;DR Watch Tasks in action:

Today, we're excited to announce Tasks - an AI Runtime that executes prompts and tools, allowing anyone to create intelligent operations and workflows that integrate seamlessly with their existing software.

The concept is simple: provide Tasks with a prompt and tools, and it creates a smart, hosted service that carries out your instructions using the tools you've selected.

What Makes Tasks Special?

Tasks combine the intuitive interface of AI chat applications with the automation capabilities of AI agents through two key components: prompts and tools.

Think of Tasks as a bridge between your instructions (prompts) and your everyday applications. You can create powerful integrations and automations without writing code or managing complex infrastructure. Tasks can be triggered in three ways:

Manually via the Tasks UI
Through HTTP events (like webhooks or API requests)
On a schedule (recurring at intervals you choose)

Here are some practical examples of what Tasks can do:

Receive Webflow form submissions and automatically route them to appropriate Slack channels and log records in Notion.
Create a morning news digest that scans headlines at 7:30 AM PT, summarizes relevant articles, and emails your marketing team updates about your company and competitors.
Set up weekly project health checks that review GitHub issues and pull requests every Friday at 2 PM ET, identifying which projects are on track and which need attention, assessed against product requirements in Linear.
Automate recurring revenue reporting by pulling data from Recurly on the first of each month, analyzing subscription changes, saving the report to Google Docs, and sharing a link with your sales team.

While we've all benefited from conversational AI tools like Claude and ChatGPT, their potential extends far beyond simple chat interactions. The familiar prompt interface we use to get answers or generate content can now become the foundation for powerful, reusable programs.

Tasks democratize automation by allowing anyone to create sophisticated workflows and integrations using natural language prompts. Whether you're building complex agent platforms or streamlining your organization's systems, Tasks adapt to your needs.

We're already seeing users build with Tasks in innovative ways, from developing advanced agent platforms to creating next-generation system integration solutions. If you're interested in learning how Tasks can benefit your organization, please reach out - we're here to help.

Try it out!

On Microsoft's AI Vision

January 14, 2025 · 3 min read

Steve Manuel

CEO, Co-founder @ Dylibso

Yesterday, Microsoft CEO Satya Nadella announced a major reorganization focused on AI platforms and tools, signaling the next phase of the AI revolution. Reading between the lines of Microsoft's announcement and comparing it to the emerging universal tools ecosystem, there are fascinating parallels that highlight why standardized, portable AI tools are critical for enterprise success.

The Platform Vision

Nadella's memo emphasizes that we're witnessing an unprecedented transformation:

"Thirty years of change is being compressed into three years!"

He outlines a future where AI reshapes every layer of the application stack, requiring:

New UI/UX patterns
Runtimes to build and orchestrate agents
Reimagined management and observability layers

The Universal Tools Answer

This is where universal tools and platforms like mcp.run become critical enablers of this vision. Let's break down how:

1. Standardization & Scalability

The new AI ecosystem demands standardized ways for systems to interact. Universal tools provide:

Consistent interfaces for AI systems to access external services
Dynamic capability updates without infrastructure changes
Scalable architecture for adding new AI capabilities

2. Security & Enterprise Readiness

As AI capabilities expand, security becomes paramount. Modern tool architectures offer:

Built-in security controls and permissions management
Sandboxed execution environments (like WebAssembly)
Enterprise-grade safety across business processes

3. Cross-Platform & Universal Access

Microsoft emphasizes AI reshaping every application layer. Universal tools support this by:

Running anywhere WebAssembly is supported (browser, mobile, edge, server)
Ensuring consistent behavior across platforms
Enabling truly portable AI capabilities

4. Developer Experience & Tool Integration

The future requires new ways of building and deploying AI capabilities:

Unified management systems for AI tools
Streamlined deployment and updates
Rapid development and integration of new capabilities

5. Future-Proofing

As Nadella notes, we're seeing unprecedented rate of change. Universal tools help by:

Ensuring portability as AI expands to new environments
Enabling discovery and updates without infrastructure rebuilds
Creating standardized patterns for AI tool development

Bridging Vision and Reality

Microsoft's reorganization announcement highlights the massive transformation happening in enterprise software. The success of this transformation will depend on having the right tools and platforms to implement these grand visions.

Universal tools provide the practical foundation needed to:

Safely adapt AI capabilities across different contexts
Maintain security and control
Enable rapid innovation and deployment
Support cross-platform compatibility

The Path Forward

As we enter what Nadella calls "the next innings of this AI platform shift," the role of universal tools becomes increasingly critical. They provide the standardized, secure, and portable layer needed to implement ambitious AI platform visions across different environments and use cases.

For enterprises looking to succeed in this AI transformation, investing in universal tools and standardized approaches isn't just good practice—it's becoming essential for success.

We're working with companies and angencies looking to enrich AI applications with tools -- if you're considering how agents play a role in your infrastructure or business operations, don't hesitate to reach out!

This post analyzes the intersection of Microsoft's January 14, 2025 AI platform announcement with emerging universal tools patterns. For more information, see Microsoft's blog post and mcp.run's overview of universal tools.

or, a critical look at "A Critical Look at MCP."​

level setting​

transports are not the bottleneck​

oauth has been going great, really​

real problems​

our job​

in which chris spins a sign pointing you to mcp.run for memorial day deals​

The New Frontier of AI Application Integration​

User-Driven vs. App-Defined Integrations: A Paradigm Shift​

Introducing the SSO for MCP​

How It Works​

The Power of Profiles for Context Management​

Does MCP Need Its Own SSO Solution?​

The Challenge of Managing Multiple Authenticated Connections​

How OAuth and SSO Traditionally Solve Identity Challenges​

User Choice is Now Critical​

Flipping the Integration Model​

Real-World Benefits​

For Users​

For Developers​

For Enterprises​

Unlocking the Potential of MCP​

Servlets: Still MCP, Just Lighter and More Secure​

Explicit Consent​

More Safeguards​

API Integration: Servlets as a Security Boundary​

WebAssembly: A Portable Sandbox​

Conclusion​

The Problem with Traditional API Integration​

The Automobile Differential: A Mechanical Analogy​

MCP: The API Differential​

Traditional API Calls vs. MCP Calls​

How MCP Clients Resolve Required Parameters​

Tool Discovery, Dependency Analysis, Parameter Resolution​

Why This Matters for System Resilience​

The Technical Magic Behind MCP's Differential Capabilities​

1. Dynamic Tool Discovery​

2. Semantic Parameter Mapping​

Future-proofing​

Interoperability​

Conceptual clarity​

Field validation​

Extensibility​

3. Intent-Based Execution​

Building Resilient Systems with MCP​

The Future of System Integration​

Get Started with MCP​

Announcing the first MCP March Madness Tournament!​

How It Works​

Selecting the MCPs​

NEW: Our Tool Use Eval Framework​

The Grand Finale​

2025 Schedule​

Round 1: Supabase vs. Neon​

The results are in... and the winner is...

🏆 Supabase 🏆

Summary​

Eval Results​

Next up!​

Round 2: Resend vs. Loops​

The results are in... and the winner is...

🏆 Resend 🏆

Summary​

Eval Results​

Next up!​

Round 3: Perplexity vs. Brave Search​

Content Output​

AI Agents Newsletter: Latest Developments - March 2025

NVIDIA's Game-Changing AI Agent Infrastructure

NVIDIA AI-Q Blueprint and AgentIQ Toolkit

Llama Nemotron Model Family

Enterprise AI Agent Adoption

Industry Implementation Examples

Zoom's AI Companion Enhancements

Developer Tools and Frameworks

OpenAI Agents SDK

Eclipse Foundation Theia AI

Research Breakthroughs

Multi-Agent Systems

SYMBIOSIS Framework

or, a critical look at "A Critical Look at MCP."

level setting

transports are not the bottleneck

oauth has been going great, really

real problems

our job

in which chris spins a sign pointing you to mcp.run for memorial day deals

The New Frontier of AI Application Integration

User-Driven vs. App-Defined Integrations: A Paradigm Shift

Introducing the SSO for MCP

How It Works

The Power of Profiles for Context Management

Does MCP Need Its Own SSO Solution?

The Challenge of Managing Multiple Authenticated Connections

How OAuth and SSO Traditionally Solve Identity Challenges

User Choice is Now Critical

Flipping the Integration Model

Real-World Benefits

For Users

For Developers

For Enterprises

Unlocking the Potential of MCP

Servlets: Still MCP, Just Lighter and More Secure

Explicit Consent

More Safeguards

API Integration: Servlets as a Security Boundary

WebAssembly: A Portable Sandbox

Conclusion

The Problem with Traditional API Integration

The Automobile Differential: A Mechanical Analogy

MCP: The API Differential

Traditional API Calls vs. MCP Calls

How MCP Clients Resolve Required Parameters

Tool Discovery, Dependency Analysis, Parameter Resolution

Why This Matters for System Resilience

The Technical Magic Behind MCP's Differential Capabilities

1. Dynamic Tool Discovery

2. Semantic Parameter Mapping

Future-proofing

Interoperability

Conceptual clarity

Field validation

Extensibility

3. Intent-Based Execution

Building Resilient Systems with MCP

The Future of System Integration

Get Started with MCP

Announcing the first MCP March Madness Tournament!

How It Works

Selecting the MCPs

NEW: Our Tool Use Eval Framework

The Grand Finale

2025 Schedule

Round 1: Supabase vs. Neon

Summary

Eval Results

Next up!

Round 2: Resend vs. Loops

Summary

Eval Results

Next up!

Round 3: Perplexity vs. Brave Search

Content Output

This is a close one...

Eval Results

Next up!

Grand Finale: 3 vs. 3 Showdown

Keep in touch

How to Follow Along

Want to Participate?

Traditional Runtimes: More Than Just Execution

Enter the World of LLMs: Pure Computation Needs Tools

The Missing Link

The Linking Problem

Why This Matters

The Current State: Developer-Centric Tools

Democratizing AI: Beyond Engineering

The Next Evolution: Universal AI Runtimes

Real-World Applications

The Future of Work