Model Context Protocol (MCP) support with new use case (#42)

* initial mcp * food ordering with mcp * prompt eng * splitting out goals and updating docs * a diff so I can get tests from codex * a diff so I can get tests from codex * oops, missing files * tests, file formatting * readme and setup updates * setup.md link fixes * readme change * readme change * readme change * stripe food setup script * single agent mode default * prompt engineering for better multi agent performance * performance should be greatly improved * Update goals/finance.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update activities/tool_activities.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * co-pilot PR suggested this change, and now fixed it * stronger wording around json format response * formatting * moved docs to dir * moved image assets under docs * cleanup env example, stripe guidance * cleanup --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-16 22:48:09 +01:00 · 2025-06-09 16:39:57 -07:00
parent 1811e4cf59
commit 5d55a9fe80
49 changed files with 3268 additions and 279 deletions
--- a/README.md
+++ b/README.md
@@ -1,6 +1,10 @@
 # Temporal AI Agent

-This demo shows a multi-turn conversation with an AI agent running inside a Temporal workflow. The purpose of the agent is to collect information towards a goal, running tools along the way. There's a simple DSL input for collecting information (currently set up to use mock functions to search for public events, search for flights around those events, then create a test Stripe invoice for the trip).
+This demo shows a multi-turn conversation with an AI agent running inside a Temporal workflow. The purpose of the agent is to collect information towards a goal, running tools along the way. The agent supports both native tools and Model Context Protocol (MCP) tools, allowing it to interact with external services.
+
+The agent operates in single-agent mode by default, focusing on one specific goal. It also supports experimental multi-agent/multi-goal mode where users can choose between different agent types and switch between them during conversations.
+
+Goals are organized in the `/goals/` directory by category (finance, HR, travel, ecommerce, etc.) and can leverage both native and MCP tools.

 The AI will respond with clarifications and ask for any missing information to that goal. You can configure it to use any LLM supported by [LiteLLM](https://docs.litellm.ai/docs/providers), including:
 - OpenAI models (GPT-4, GPT-3.5)
@@ -19,7 +23,7 @@ See multi-agent execution in action [here](https://www.youtube.com/watch?v=8Dc_0

 ## Why Temporal?
 There are a lot of AI and Agentic AI tools out there, and more on the way. But why Temporal? Temporal gives this system reliablity, state management, a code-first approach that we really like, built-in observability and easy error handling.
-For more, check out [architecture-decisions](./architecture-decisions.md).
+For more, check out [architecture-decisions](docs/architecture-decisions.md).

 ## What is "Agentic AI"?
 These are the key elements of an agentic framework:
@@ -31,20 +35,29 @@ These are the key elements of an agentic framework:
 6. Prompt construction made of system prompts, conversation history, and tool metadata - sent to the LLM to create user questions and confirmations
 7. Ideally high durability (done in this system with Temporal Workflow and Activities)

-For a deeper dive into this, check out the [architecture guide](./architecture.md).
+For a deeper dive into this, check out the [architecture guide](docs/architecture.md).
+
+## 🔧 MCP Tool Calling Support
+
+This agent acts as an **MCP (Model Context Protocol) client**, enabling seamless integration with external services and tools. The system supports two types of tools:
+- **Native Tools**: Custom tools implemented directly in the codebase (in `/tools/`)
+ - **MCP Tools**: External tools accessed via Model Context Protocol (MCP) servers like Stripe, databases, or APIs. Configuration is covered in [the Setup guide](docs/setup.md)
+- Set `AGENT_GOAL=goal_food_ordering` with `SHOW_CONFIRM=False` in `.env` for an example of a goal that calls MCP Tools (Stripe).

 ## Setup and Configuration
-See [the Setup guide](./SETUP.md) for detailed instructions. The basic configuration requires just two environment variables:
+See [the Setup guide](docs/setup.md) for detailed instructions. The basic configuration requires just two environment variables:
 ```bash
 LLM_MODEL=openai/gpt-4o  # or any other model supported by LiteLLM
 LLM_KEY=your-api-key-here
 ```

 ## Customizing Interaction & Tools
-See [the guide to adding goals and tools](./adding-goals-and-tools.md).
+See [the guide to adding goals and tools](docs/adding-goals-and-tools.md).
+
+The system supports MCP (Model Context Protocol) for easy integration with external services. MCP server configurations are managed in `shared/mcp_config.py`, and goals are organized by category in the `/goals/` directory.

 ## Architecture
-See [the architecture guide](./architecture.md).
+See [the architecture guide](docs/architecture.md).

 ## Testing

@@ -66,25 +79,24 @@ poetry run pytest --workflow-environment=time-skipping
 - ✅ **Activity Tests**: ToolActivities, LLM integration (mocked), environment configuration
 - ✅ **Integration Tests**: End-to-end workflow and activity execution

-**Documentation:**
- **Quick Start**: [TESTING.md](TESTING.md) - Simple commands to run tests
+- **Quick Start**: [testing.md](docs/testing.md) - Simple commands to run tests
 - **Comprehensive Guide**: [tests/README.md](tests/README.md) - Detailed testing documentation, patterns, and best practices

 ## Development

-To contribute to this project, see [CONTRIBUTING.md](CONTRIBUTING.md).
+To contribute to this project, see [contributing.md](docs/contributing.md).

-Start the Temporal Server and API server, see [setup](SETUP.md)
+Start the Temporal Server and API server, see [setup](docs/setup.md)

 ## Productionalization & Adding Features
 - In a prod setting, I would need to ensure that payload data is stored separately (e.g. in S3 or a noSQL db - the claim-check pattern), or otherwise 'garbage collected'. Without these techniques, long conversations will fill up the workflow's conversation history, and start to breach Temporal event history payload limits.
 - A single worker can easily support many agent workflows (chats) running at the same time. Currently the workflow ID is the same each time, so it will only run one agent at a time. To run multiple agents, you can use a different workflow ID each time (e.g. by using a UUID or timestamp).
 - Perhaps the UI should show when the LLM response is being retried (i.e. activity retry attempt because the LLM provided bad output)
- The project now includes comprehensive tests for workflows and activities! [See testing guide](TESTING.md).
+- The project now includes comprehensive tests for workflows and activities! [See testing guide](docs/testing.md).

-See [the todo](./todo.md) for more details on things we want to do (or that you could contribute!).
+See [the todo](docs/todo.md) for more details on things we want to do (or that you could contribute!).

-See [the guide to adding goals and tools](./adding-goals-and-tools.md) for more ways you can add features.
+See [the guide to adding goals and tools](docs/adding-goals-and-tools.md) for more ways you can add features.

 ## Enablement Guide (internal resource for Temporal employees)
 Check out the [slides](https://docs.google.com/presentation/d/1wUFY4v17vrtv8llreKEBDPLRtZte3FixxBUn0uWy5NU/edit#slide=id.g3333e5deaa9_0_0) here and the [enablement guide](https://docs.google.com/document/d/14E0cEOibUAgHPBqConbWXgPUBY0Oxrnt6_AImdiheW4/edit?tab=t.0#heading=h.ajnq2v3xqbu1).