This stage focuses on creating an enterprise-ready AI capability designed to be consumed by applications. Instead of building a UI-bound demo, we build and deploy an LLM-powered application using the Mule framework, independent of any specific user interface.
Modern LLM-driven systems separate user interaction, application responsibility, and AI execution into distinct layers.
In this workshop module, we use the above generative AI application architecture as a reference point, not as something we fully implement. Rather than building a user interface, we focus on delivering a core AI capability that fits cleanly into the architecture shown at the top of this document.
In a complete LLM-driven application, the user interface is responsible only for interaction — collecting user input and displaying responses. This interface could be a web UI, a mobile app, a backend service, or a Python application (e.g., with a Streamlit library based UI). The choice of UI is intentionally left open in this stage.
The application stack sits behind the user interface and controls application behavior. This layer is responsible for structuring requests, managing conversation state if required, deciding what context is sent to the model, handling errors, and invoking downstream capabilities. In this workshop, the application stack is represented conceptually rather than implemented, because our goal is to build the AI execution layer it would call.
When the application stack requires an AI response, it does not call the language model directly. Instead, it sends a request to Mule, which acts as a stateless orchestrator. Mule executes inference using the Mule Inference Connector, applies configured model parameters, and provides secure, observable access to the underlying LLM (Groq). Mule does not manage conversational state; it executes each inference request exactly as it is received.
This separation is intentional. The user interface handles interaction, the application stack handles behavior, and Mule handles AI execution. By isolating inference execution behind Mule, the AI capability becomes reusable, governed, and independent of any specific application or UI technology.
At this stage, the agent has intelligence but no enterprise context. There is no grounding, no access to internal systems, and no business logic. This allows you to clearly observe what managed LLM inference provides on its own, and where its limitations begin.
In more advanced systems, additional architectural components are introduced — such as a task planner (reasoning engine), context data stores (for example, vector databases), and prompt management layers — along with expanded governance and observability. Those elements are intentionally deferred to later stages so the foundational execution model remains clear.
MuleSoft provides a suite of Anypoint AI Connectors that integrate Large Language Models (LLMs) and Vector Stores directly into existing business workflows. They provide a unified interface for developers to build and manage enterprise-grade AI agents that can access both Salesforce and non-Salesforce data.
By abstracting the technical complexities of different AI technologies, these connectors simplify the development of autonomous agents and coordinate interactions between various models and enterprise applications.
These Mule AI connectors also standardize how AI components interact and collaborate, leading to more integrated and secure AI solutions across the organization. This standardization ensures that agents can reliably access up-to-date data and perform consistent actions within a governed framework.
The suite available in Anypoint Exchange evolved from the MuleSoft AI Chain project, an open-source initiative that allows developers to build and manage agents natively within the Anypoint Platform. This relationship allows MuleSoft to apply the same full-lifecycle management principles to AI agents that it has traditionally used for APIs and integrations.
For this workshop, please sign in and start your Windows App - this app is provided to you by your instructor and contains the virtual machine with MuleSoft Anypoint Studio installed. This is the tool we shall use in this section.
In the Mule Virtual Machine, provided to you open the Mule Anypoint Studio.
To save time, instead of assembling the flows component-by-component, we’ve provided a packaged JAR file in which the key steps have already been completed. You’ll import it into Anypoint Studio, run it locally, and deploy it to CloudHub. This section walks you through the entire process end-to-end—from import to deployment and testing.
Although you are using a pre-packaged JAR rather than configuring each connector manually in Anypoint Studio, we still explain every part of it in the sections below, so you understand exactly how it works and how to apply the same approach in your own projects.
In your Mule VM (using the Chrome browser) download the following JAR file: Download JAR File
Typically it will download the file in the following locations inside the VM: C:\Users\workshop\Downloads
Open Mule Anypoint Studio in your VM (Windows App provided to you by your instructor).
Click on File → Import and find the downloaded JAR (stage1-lab-inference-chat.jar) on your VM.
Choose the import wizard → Anypoint Studio → Packaged Application
You can monitor the import progress in the bottom right corner of the Anypoint Studio.
Once import completes, a project gets created in Anypoint Studio with the configurations packaged in the JAR file.
Click on /src/main/resources/config.properties in your project in Anypoint Studio.
Put in your Groq API key value on the first line. Then save the file (File → Save).
Now let’s test your project locally. Right click on your project in the Mule Anypoint Studio and click on Run As → Mule Application.
You should see Deployed in the bottom screen in Mule Anypoint Studio.
Send a message to the LLM via your Inference Connector:
Copy paste this in your windows shell.
curl -X POST http://localhost:8081/chat -H "Content-Type: application/json" -d "[{\"role\":\"user\",\"content\":\"Capital of Germany?\"}]"
You should see the following output.
Now let’s deploy your project to CloudHub.
Right click on your project → Anypoint Platform → Deploy to CloudHub
It will bring up a browser window inside your Anypoint Studio - authenticate with your Mule trial username.
Select No for creating a password hint.
After sign-in, go back to your project - Right Click → Anypoint Platform → Deploy to CloudHub - select Sandbox
In the next page, click on Properties and put in your groq.apiKey, groq.model and Connection.
groq.model=llama-3.3-70b-versatile
groq.apiKey={yourAPIKEY}
Connection=groq
Following is a screenshot of the CloudHub property page.
Then press Deploy Application button.
It takes about 10 minutes to deploy.
To monitor deployment on CloudHub - go to mulesoft.com and login with your trial account → Runtime Manager
You should see your application running on CloudHub Runtime Manager.
Let’s test your application.
Click on your application and on your application page copy the public endpoint.
Let’s send a question to your application via CURL from your Windows command shell. Remember to replace the URL in the following command with your public endpoint.
curl -X POST https://stage1-customer-inference-chat-l0r84xyz.usa-e2.cloudhub.io/chat -H "Content-Type: application/json" -d "[{\"role\":\"user\",\"content\":\"Capital of Germany?\"}]"
https://thin-chat-client-085335f023d7.herokuapp.com/
This GUI client lets you interact with the headless agent powered by the Mule Inference Connector on CloudHub.
In this section we focused on building an enterprise-ready AI capability that applications can consume — not a UI-bound demo.
By completing Stage 1, you have built and deployed a production-style, enterprise-grade LLM inference endpoint using Mule.
You did not build a UI in this stage — intentionally. Instead, you focused on creating a clean, reusable AI capability that can be consumed by any application.
Specifically, you have:
POST /chat) hosted inside Mule. This endpoint accepts structured chat input and returns a model response, making it suitable for use by web applications, backend services, agents, or automation workflows.By doing this, you gained several concrete enterprise capabilities automatically:
At this stage, the capability has intelligence but no enterprise context. It is not grounded in internal data, does not call backend systems, and does not execute business actions. This is intentional. You now have a clean, well-defined baseline that demonstrates what managed LLM inference provides on its own.
This endpoint can now be wrapped by any application — for example, a Python application with a Streamlit-based UI, a web frontend, an internal service, or another agent — without changing the Mule implementation.
This completes Stage 1 and sets the foundation for adding grounding, enterprise data, and multi-agent orchestration in the next stages.
Studio will generate a project skeleton.
In a Mule application, a flow always starts with a source. A source is the component that waits for an external event and starts the flow when that event occurs.
In this project, the external event is an HTTP request sent by the Python application.
When a user submits a message, the Python application sends an HTTP request to Mule. The HTTP Listener is the component that receives this request.
HTTP > Listener:
Once the HTTP Listener receives the request, the flow routes the request to the Mule Inference Connector, which executes the inference against the configured language model and returns the response.
In simple terms: The HTTP Listener receives the user request, and the Inference Connector handles the AI execution.
To configure an HTTP > Listener source, follow these steps:
To enable inference in your Mule application, add the Mule Inference Connector to your Mule project. When you add the connector, Anypoint Studio automatically:
pom.xml fileThis allows your Mule flow to invoke a language model for inference in a managed, secure, and observable way.
mulesoft inference in the search field.When you add a connector operation to your flow, you are specifying an action for that connector to perform.
To add an operation for MuleSoft Inference Connector, follow these steps:
When you configure the connector, configure a global element - configuring a global element requires you to provide the authentication credentials that the connector requires to access the target inference provider.
Following are the main values you need for global element configuration:
Click on Test Connection.
Click Ok.
What our application in Anypoint Studio does: We created an API that exposes POST /chat and delegates inference to the Inference Connector.
Let’s test it now.
Select your project in the left sidebar in Mule Anypoint Studio. Then right click and select ‘Run as - Mule Application.’
In a Mac terminal window paste this curl command:
curl -X POST http://localhost:8081/chat \
-H "Content-Type: application/json" \
-d '[
{
"role": "user",
"content": "What is the capital of Germany?"
}
]'
You should see the following response: {"response":"The capital of Germany is Berlin."}
Windows: If you are using Windows (e.g., your Mule VM is Windows)- use the following CURL command.
curl -X POST http://localhost:8081/chat -H "Content-Type: application/json" -d "[{\"role\":\"user\",\"content\":\"Capital of Germany?\"}]"
In the Properties tab, set the environment variables as follows:
groq.apiKey= This is your groq api key
Connection= Groq
groqModelName = llama-3.3-70b-versatile
Deploy Application
This takes time and you can monitor the progress in Anypoint Studio console.
Now login to MuleSoft using your trial account in the browser - and navigate to Runtimes → Runtime Manager → Sandbox.
Click on your application (it should say Running).
Copy the public endpoint that you see on top right of the application page, e.g., https://inference-chat-xyz111.usa-e2.cloudhub.io/
Test your application
In a terminal window paste this curl command:
curl -X POST "https://inference-chat-xyz111.usa-e2.cloudhub.io/chat" \
-H "Content-Type: application/json" \
-d '[
{
"role": "user",
"content": "What is the capital of Germany?"
}
]'
You should see the correct response displayed in your terminal window.
https://thin-chat-client-085335f023d7.herokuapp.com/
This GUI client lets you interact with the headless agent powered by the Mule Inference Connector on CloudHub.
In this scenario, we completely skip the Mule Anypoint Studio tool.
We import the instructor provided JAR file directly in Cloudhub in your Mule trial account.
x.jar where x is the name of the jar file provided by the instructor.Set in your values for:
groq.model=llama-3.3-70b-versatilegroq.apiKey=your_keyDeploy.
This takes time and you can monitor the progress in Anypoint Studio console.
Now login to MuleSoft using your trial account in the browser - and navigate to Runtimes → Runtime Manager → Sandbox.
Click on your application (it should say Running).
Copy the Public Endpoint that you see on top right of the application page.
e.g., https://inference-chat-xyz111.usa-e2.cloudhub.io/
Test your application.
In a Mac terminal window paste this curl command (replace the endpoint with your own endpoint):
curl -X POST "https://inference-chat-xyz111.usa-e2.cloudhub.io/chat" \
-H "Content-Type: application/json" \
-d '[
{
"role": "user",
"content": "What is the capital of Germany?"
}
]'
You should see the correct response displayed in your terminal window.
/chat or /xYou will get: x.jar where x is the name of your current project.
When creating a new Mule project, you can skip the build, you can simply import a JAR file from a previously completed project. E.g., your instructor may have provided you with a JAR file.
A Mule project is generated with you with all the elements you need (from the project that was bundled in the imported JAR file).
Click on Chat Completions in the Message Flow window (in the middle of the screen in Anypoint Studio.
Click on the file config.properties in the left sidebar (/src/main/resources)
Set in your values for:
groq.model=llama-3.3-70b-versatilegroq.apiKey=your_keySave your project.
Now you continue with local testing and then deploying on Cloudhub from Anypoint Studio.
How to authenticate in Anypoint Studio with your Mule account: https://docs.mulesoft.com/studio/set-credentials-in-studio-to
Traditional enterprise API development in Mule follows a deliberate, governed lifecycle. Each stage has a clear purpose and associated Mule capabilities
What happens
Mule features
Why this matters
This is the foundation for governance, security, and reuse.
What happens
Mule features
Why this matters
What happens
Mule features
Why this matters
What happens
Mule features
Why this matters
What happens
Mule features
Why this matters
This approach matters because it gives you:
None of these problems disappear with AI. In fact, AI increases the blast radius if these controls are missing.
What Changes in an AI World (and What Does Not)
AI introduces a new execution model.
What changes
What does NOT change
The key architectural shift: Not everything exposed to an AI agent should be a public, consumer-facing API — but everything the agent can reach must still be governed.
This is where Mule becomes more important, not less.
Now let’s connect the full cycle API development concepts directly to our workshop lab module (Stage 1: Enterprise-Grade LLM Activation with Mule Inference Connector).
In Stage 1 of the workshop, we intentionally:
Instead, we created:
This maps cleanly to full-cycle concepts: System API= LLM inference capability, Connector= Inference Connector (Groq), Runtime= CloudHub, Security= Environment-scoped secrets, Monitoring= Mule runtime metrics.
This is Stage-0 of an API network, not a violation of it.
In a real production implementation, you would layer on full-cycle stages around this capability:
Full-cycle API development is not replaced by AI. It becomes the governance skeleton that allows AI systems to operate safely. Our workshop stages do not bypass API discipline — they reorder it: First, establish a governed execution fabric for AI. Then, layer contracts, orchestration, and experiences on top.
Stage 1 of our workshop highlights that AI execution itself must be treated as a first-class enterprise capability, subject to the same lifecycle thinking that made API programs successful in the first place.