Stage 1: Enterprise-Grade LLM Activation with Mule Inference Connector

This stage focuses on creating an enterprise-ready AI capability designed to be consumed by applications. Instead of building a UI-bound demo, we build and deploy an LLM-powered application using the Mule framework, independent of any specific user interface.

Introduction & Technical Architecture

Architecture Diagram

Modern LLM-driven systems separate user interaction, application responsibility, and AI execution into distinct layers.

In this workshop module, we use the above generative AI application architecture as a reference point, not as something we fully implement. Rather than building a user interface, we focus on delivering a core AI capability that fits cleanly into the architecture shown at the top of this document.

In a complete LLM-driven application, the user interface is responsible only for interaction — collecting user input and displaying responses. This interface could be a web UI, a mobile app, a backend service, or a Python application (e.g., with a Streamlit library based UI). The choice of UI is intentionally left open in this stage.

The application stack sits behind the user interface and controls application behavior. This layer is responsible for structuring requests, managing conversation state if required, deciding what context is sent to the model, handling errors, and invoking downstream capabilities. In this workshop, the application stack is represented conceptually rather than implemented, because our goal is to build the AI execution layer it would call.

When the application stack requires an AI response, it does not call the language model directly. Instead, it sends a request to Mule, which acts as a stateless orchestrator. Mule executes inference using the Mule Inference Connector, applies configured model parameters, and provides secure, observable access to the underlying LLM (Groq). Mule does not manage conversational state; it executes each inference request exactly as it is received.

This separation is intentional. The user interface handles interaction, the application stack handles behavior, and Mule handles AI execution. By isolating inference execution behind Mule, the AI capability becomes reusable, governed, and independent of any specific application or UI technology.

At this stage, the agent has intelligence but no enterprise context. There is no grounding, no access to internal systems, and no business logic. This allows you to clearly observe what managed LLM inference provides on its own, and where its limitations begin.

In more advanced systems, additional architectural components are introduced — such as a task planner (reasoning engine), context data stores (for example, vector databases), and prompt management layers — along with expanded governance and observability. Those elements are intentionally deferred to later stages so the foundational execution model remains clear.

How Mule Supports AI Development

MuleSoft provides a suite of Anypoint AI Connectors that integrate Large Language Models (LLMs) and Vector Stores directly into existing business workflows. They provide a unified interface for developers to build and manage enterprise-grade AI agents that can access both Salesforce and non-Salesforce data.

By abstracting the technical complexities of different AI technologies, these connectors simplify the development of autonomous agents and coordinate interactions between various models and enterprise applications.

These Mule AI connectors also standardize how AI components interact and collaborate, leading to more integrated and secure AI solutions across the organization. This standardization ensures that agents can reliably access up-to-date data and perform consistent actions within a governed framework.

The suite available in Anypoint Exchange evolved from the MuleSoft AI Chain project, an open-source initiative that allows developers to build and manage agents natively within the Anypoint Platform. This relationship allows MuleSoft to apply the same full-lifecycle management principles to AI agents that it has traditionally used for APIs and integrations.


Hands-On: Building an Enterprise Grade Gen AI App

Hands-On Banner

Use Mule Anypoint Studio to Create a Mule Project with Inference Connector

For this workshop, please sign in and start your Windows App - this app is provided to you by your instructor and contains the virtual machine with MuleSoft Anypoint Studio installed. This is the tool we shall use in this section.

In the Mule Virtual Machine, provided to you open the Mule Anypoint Studio.

To save time, instead of assembling the flows component-by-component, we’ve provided a packaged JAR file in which the key steps have already been completed. You’ll import it into Anypoint Studio, run it locally, and deploy it to CloudHub. This section walks you through the entire process end-to-end—from import to deployment and testing.

Although you are using a pre-packaged JAR rather than configuring each connector manually in Anypoint Studio, we still explain every part of it in the sections below, so you understand exactly how it works and how to apply the same approach in your own projects.

In your Mule VM (using the Chrome browser) download the following JAR file: Download JAR File

Typically it will download the file in the following locations inside the VM: C:\Users\workshop\Downloads

Open Mule Anypoint Studio in your VM (Windows App provided to you by your instructor).

Click on File → Import and find the downloaded JAR (stage1-lab-inference-chat.jar) on your VM.

Import Wizard

Choose the import wizard → Anypoint Studio → Packaged Application

You can monitor the import progress in the bottom right corner of the Anypoint Studio.

Once import completes, a project gets created in Anypoint Studio with the configurations packaged in the JAR file.

Project Created

What is this Mule app that you just now imported? In a later section in the appendix (How To Build It: Step-by-Step Guide) we have explained how you would assemble this app from scratch.

Click on /src/main/resources/config.properties in your project in Anypoint Studio.

Put in your Groq API key value on the first line. Then save the file (File → Save).

Config Properties

Now let’s test your project locally. Right click on your project in the Mule Anypoint Studio and click on Run As → Mule Application.

Run as Mule Application

You should see Deployed in the bottom screen in Mule Anypoint Studio.

Deployed Status

Send a message to the LLM via your Inference Connector:

Copy paste this in your windows shell.

curl -X POST http://localhost:8081/chat -H "Content-Type: application/json" -d "[{\"role\":\"user\",\"content\":\"Capital of Germany?\"}]"

You should see the following output.

cURL Output

Now let’s deploy your project to CloudHub.

Right click on your project → Anypoint Platform → Deploy to CloudHub

It will bring up a browser window inside your Anypoint Studio - authenticate with your Mule trial username.

Deploy to CloudHub Auth

Select No for creating a password hint.

After sign-in, go back to your project - Right Click → Anypoint Platform → Deploy to CloudHub - select Sandbox

Select Sandbox

In the next page, click on Properties and put in your groq.apiKey, groq.model and Connection.

groq.model=llama-3.3-70b-versatile
groq.apiKey={yourAPIKEY}
Connection=groq

Following is a screenshot of the CloudHub property page.

CloudHub Properties

Then press Deploy Application button.

It takes about 10 minutes to deploy.

Deploying

To monitor deployment on CloudHub - go to mulesoft.com and login with your trial account → Runtime Manager

Runtime Manager Menu

You should see your application running on CloudHub Runtime Manager.

App Running in RM

Gotcha: If you are not able to deploy from Anypoint Studio (possibly due to enhanced permissions requirements) - please do direct deployment from CloudHub. Go to mulesoft.com (sign-in) → Runtime Manager → Deploy Application. Upload your Jar file, then choose the Properties tab and fill in the Property Values as above. Then Press Deploy Application.

Let’s test your application.

Click on your application and on your application page copy the public endpoint.

Public Endpoint

Let’s send a question to your application via CURL from your Windows command shell. Remember to replace the URL in the following command with your public endpoint.

curl -X POST https://stage1-customer-inference-chat-l0r84xyz.usa-e2.cloudhub.io/chat -H "Content-Type: application/json" -d "[{\"role\":\"user\",\"content\":\"Capital of Germany?\"}]"
CloudHub cURL Output

We have built a UI-based client you can use to test your headless agentic capability.

UI Client

https://thin-chat-client-085335f023d7.herokuapp.com/

This GUI client lets you interact with the headless agent powered by the Mule Inference Connector on CloudHub.


What You Have Accomplished

In this section we focused on building an enterprise-ready AI capability that applications can consume — not a UI-bound demo.

By completing Stage 1, you have built and deployed a production-style, enterprise-grade LLM inference endpoint using Mule.

You did not build a UI in this stage — intentionally. Instead, you focused on creating a clean, reusable AI capability that can be consumed by any application.

Specifically, you have:

By doing this, you gained several concrete enterprise capabilities automatically:

At this stage, the capability has intelligence but no enterprise context. It is not grounded in internal data, does not call backend systems, and does not execute business actions. This is intentional. You now have a clean, well-defined baseline that demonstrates what managed LLM inference provides on its own.

This endpoint can now be wrapped by any application — for example, a Python application with a Streamlit-based UI, a web frontend, an internal service, or another agent — without changing the Mule implementation.

This completes Stage 1 and sets the foundation for adding grounding, enterprise data, and multi-agent orchestration in the next stages.


How To Build It: Step-by-Step Guide

Step 1 — Create your first Mule project

Studio will generate a project skeleton.

Step 2 — Configure a Source

In a Mule application, a flow always starts with a source. A source is the component that waits for an external event and starts the flow when that event occurs.

In this project, the external event is an HTTP request sent by the Python application.

When a user submits a message, the Python application sends an HTTP request to Mule. The HTTP Listener is the component that receives this request.

HTTP > Listener:

Once the HTTP Listener receives the request, the flow routes the request to the Mule Inference Connector, which executes the inference against the configured language model and returns the response.

In simple terms: The HTTP Listener receives the user request, and the Inference Connector handles the AI execution.

To configure an HTTP > Listener source, follow these steps:

  1. In Mule Palette, select HTTP > Listener.
  2. Drag Listener to the Studio canvas.
  3. On the Listener configuration screen, optionally change the value of the Display Name field.
  4. Specify a value for the Path field (You can put in /chat as the path).
  5. Click the plus sign (+) next to the Connector configuration field to configure a global element that can be used by all instances of the HTTP > Listener source in the app. Select default values if needed.
  6. On the General tab, specify the connection information for the connector.
  7. On the TLS tab, optionally specify the TLS information for the connector.
  8. On the Advanced tab, optionally specify reconnection information, including a reconnection strategy.
  9. Click Test Connection to confirm that Mule can connect with the specified server.
  10. Click OK.

Step 3 — Select the Inference Connector for Your Mule Project

Nomenclature: What is Inference?
Inference is when a trained language model takes input text and computes an output using its existing training. During inference, the model does not learn, update itself, or remember previous requests.

To enable inference in your Mule application, add the Mule Inference Connector to your Mule project. When you add the connector, Anypoint Studio automatically:

This allows your Mule flow to invoke a language model for inference in a managed, secure, and observable way.

  1. In Mule Palette, click (X) Search in Exchange. You may be need to authenticate to Mule using the Mule trial account you have created earlier.
  2. In Add Dependencies to Project, type mulesoft inference in the search field.
  3. Click MuleSoft Inference Connector in Available modules.
  4. Click Add.
  5. Click Finish.

Step 4 — Add a Connector Operation to the Flow

When you add a connector operation to your flow, you are specifying an action for that connector to perform.
To add an operation for MuleSoft Inference Connector, follow these steps:

  1. In Mule Palette, select MuleSoft Inference Connector and then select the operation to add.
  2. Drag the operation i.e., [Chat] Completions - onto the Studio canvas, next to the source (Http connector).

When you configure the connector, configure a global element - configuring a global element requires you to provide the authentication credentials that the connector requires to access the target inference provider.

Following are the main values you need for global element configuration:

Click on Test Connection.
Click Ok.

Step 5 — Test locally

What our application in Anypoint Studio does: We created an API that exposes POST /chat and delegates inference to the Inference Connector.

Let’s test it now.

Select your project in the left sidebar in Mule Anypoint Studio. Then right click and select ‘Run as - Mule Application.’

In a Mac terminal window paste this curl command:

curl -X POST http://localhost:8081/chat \
-H "Content-Type: application/json" \
-d '[
{
"role": "user",
"content": "What is the capital of Germany?"
}
]'

You should see the following response: {"response":"The capital of Germany is Berlin."}

Gotcha: Sometimes the quotes (esp. the single quote at the end) gets corrupted when copying from this document. Just retype the quote.

Windows: If you are using Windows (e.g., your Mule VM is Windows)- use the following CURL command.

curl -X POST http://localhost:8081/chat -H "Content-Type: application/json" -d "[{\"role\":\"user\",\"content\":\"Capital of Germany?\"}]"

Step 6 — Deploy to CloudHub

Deploy on Cloud

  1. Right-click project → Anypoint Platform → Deploy to Cloudhub
  2. Confirm the username you are logged in as is your trial account username.
  3. Select Sandbox
  4. Choose:
    • Runtime: Mule 4.9+
    • Worker size: 0.1 vCore is enough for demo

In the Properties tab, set the environment variables as follows:

groq.apiKey= This is your groq api key
Connection= Groq
groqModelName = llama-3.3-70b-versatile
CloudHub Properties

Deploy Application

This takes time and you can monitor the progress in Anypoint Studio console.

Now login to MuleSoft using your trial account in the browser - and navigate to Runtimes → Runtime Manager → Sandbox.

Click on your application (it should say Running).

Copy the public endpoint that you see on top right of the application page, e.g., https://inference-chat-xyz111.usa-e2.cloudhub.io/

Test your application

In a terminal window paste this curl command:

curl -X POST "https://inference-chat-xyz111.usa-e2.cloudhub.io/chat" \
-H "Content-Type: application/json" \
-d '[
{
"role": "user",
"content": "What is the capital of Germany?"
}
]'

You should see the correct response displayed in your terminal window.

Gotcha: Sometimes the quotes (esp. the single quote at the end) gets corrupted when copying CURL from this document. Just retype the quote.

Gotcha: If you are not able to deploy from Anypoint Studio (possibly due to enhanced permissions requirements) - please do direct deployment from CloudHub. Go to mulesoft.com (sign-in) → Runtime Manager → Deploy Application. Upload your Jar file, then choose the Properties tab and fill in the Property Values as above. Then Press Deploy Application.

We have built a UI-based client you can use to test your headless agentic capability.

UI Client again

https://thin-chat-client-085335f023d7.herokuapp.com/

This GUI client lets you interact with the headless agent powered by the Mule Inference Connector on CloudHub.


Appendix

Deploy on Cloudhub Directly from JAR File

In this scenario, we completely skip the Mule Anypoint Studio tool.

We import the instructor provided JAR file directly in Cloudhub in your Mule trial account.

Set in your values for:

Deploy.

This takes time and you can monitor the progress in Anypoint Studio console.

Now login to MuleSoft using your trial account in the browser - and navigate to Runtimes → Runtime Manager → Sandbox.

Click on your application (it should say Running).

Copy the Public Endpoint that you see on top right of the application page.
e.g., https://inference-chat-xyz111.usa-e2.cloudhub.io/

Test your application.

In a Mac terminal window paste this curl command (replace the endpoint with your own endpoint):

curl -X POST "https://inference-chat-xyz111.usa-e2.cloudhub.io/chat" \
-H "Content-Type: application/json" \
-d '[
{
"role": "user",
"content": "What is the capital of Germany?"
}
]'

You should see the correct response displayed in your terminal window.

Gotcha: Sometimes the quotes (esp. the single quote at the end) gets corrupted when copying from this document. Just retype the quote.

Misc

Export your project: Generate JAR from Anypoint Studio:

You will get: x.jar where x is the name of your current project.

Import a JAR file in Anypoint Studio

When creating a new Mule project, you can skip the build, you can simply import a JAR file from a previously completed project. E.g., your instructor may have provided you with a JAR file.

A Mule project is generated with you with all the elements you need (from the project that was bundled in the imported JAR file).

Click on Chat Completions in the Message Flow window (in the middle of the screen in Anypoint Studio.

Click on the file config.properties in the left sidebar (/src/main/resources)

Set in your values for:

Save your project.

Now you continue with local testing and then deploying on Cloudhub from Anypoint Studio.

Reference

How to authenticate in Anypoint Studio with your Mule account: https://docs.mulesoft.com/studio/set-credentials-in-studio-to


Mapping Full-Cycle API Development to Our Lab Work

Traditional enterprise API development in Mule follows a deliberate, governed lifecycle. Each stage has a clear purpose and associated Mule capabilities

The Different Stages of Full Cycle API Development

Stage 1 — API Design (Contract First)

What happens

Mule features

Why this matters

This is the foundation for governance, security, and reuse.

Stage 2 — Build (Implementation from Spec)

What happens

Mule features

Why this matters

Stage 3 — Deploy & Secure

What happens

Mule features

Why this matters

Stage 4 — Consume & Compose (API Network)

What happens

Mule features

Why this matters

Stage 5 — Monitor & Operate

What happens

Mule features

Why this matters

Why Full-Cycle API Development Is Still Critical

This approach matters because it gives you:

None of these problems disappear with AI. In fact, AI increases the blast radius if these controls are missing.

What Changes in an AI World (and What Does Not)

AI introduces a new execution model.

What changes

What does NOT change

The key architectural shift: Not everything exposed to an AI agent should be a public, consumer-facing API — but everything the agent can reach must still be governed.

This is where Mule becomes more important, not less.

Implementing AI Patterns Using Full-Cycle API Development Concepts

Now let’s connect the full cycle API development concepts directly to our workshop lab module (Stage 1: Enterprise-Grade LLM Activation with Mule Inference Connector).

Stage 1 Pattern: Inference Connector as a System-Level Capability

In Stage 1 of the workshop, we intentionally:

Instead, we created:

This maps cleanly to full-cycle concepts: System API= LLM inference capability, Connector= Inference Connector (Groq), Runtime= CloudHub, Security= Environment-scoped secrets, Monitoring= Mule runtime metrics.

This is Stage-0 of an API network, not a violation of it.

How Our Lab Work Becomes “Full-Cycle” in Production

In a real production implementation, you would layer on full-cycle stages around this capability:

Design

Build

Secure

Compose

Monitor

Full-cycle API development is not replaced by AI. It becomes the governance skeleton that allows AI systems to operate safely. Our workshop stages do not bypass API discipline — they reorder it: First, establish a governed execution fabric for AI. Then, layer contracts, orchestration, and experiences on top.

Stage 1 of our workshop highlights that AI execution itself must be treated as a first-class enterprise capability, subject to the same lifecycle thinking that made API programs successful in the first place.

Reference: If you would like to explore full-cycle API led development, please refer to this workshop: API Led Connectivity Workshop: https://alc.workshops.mulesoft.com/general/index.html


Thank You.

Salesforce Characters