Model Context Protocol (MCP) support with new use case (#42)

* initial mcp * food ordering with mcp * prompt eng * splitting out goals and updating docs * a diff so I can get tests from codex * a diff so I can get tests from codex * oops, missing files * tests, file formatting * readme and setup updates * setup.md link fixes * readme change * readme change * readme change * stripe food setup script * single agent mode default * prompt engineering for better multi agent performance * performance should be greatly improved * Update goals/finance.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update activities/tool_activities.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * co-pilot PR suggested this change, and now fixed it * stronger wording around json format response * formatting * moved docs to dir * moved image assets under docs * cleanup env example, stripe guidance * cleanup --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-15 05:58:08 +01:00 · 2025-06-09 16:39:57 -07:00
parent 1811e4cf59
commit 5d55a9fe80
49 changed files with 3268 additions and 279 deletions
--- a/docs/README.md
+++ b/docs/README.md
@@ -0,0 +1,10 @@
+# Documentation Index
+
+- **architecture.md** - Overview of system components and how they interact.
+- **architecture-decisions.md** - Rationale behind key design choices.
+- **changelog.md** - Project history and notable changes.
+- **contributing.md** - How to contribute and run tests.
+- **setup.md** - Installation and configuration instructions.
+- **testing.md** - Commands for running the test suite.
+- **adding-goals-and-tools.md** - Guide to extending the agent with new goals and tools.
+- **todo.md** - Planned enhancements and future work.
--- a/docs/adding-goals-and-tools.md
+++ b/docs/adding-goals-and-tools.md
@@ -0,0 +1,177 @@
+# Customizing the Agent
+The agent operates in single-agent mode by default, focusing on one specific goal. It also supports an experimental multi-agent mode where users can have multiple agents, each with their own goal, and supports switching back to choosing a new goal at the end of every successful goal (or even mid-goal). 
+
+A goal can use two types of tools:
+- **Native Tools**: Custom tools implemented directly in the codebase (in `/tools/`)
+- **MCP Tools**: External tools accessed via Model Context Protocol (MCP) servers
+
+It may be helpful to review the [architecture](./architecture.md) for a guide and definition of goals, tools, etc.
+
+## Adding a New Goal Category
+Goal Categories lets you pick which groups of goals to show in multi-agent mode. Set via an .env setting, `GOAL_CATEGORIES`. 
+Even if you don't intend to use the goal in a multi-agent scenario, goal categories are useful for organization and discovery.
+1. Pick a unique one that has some business meaning
+2. Use it in your [.env](./.env) file
+3. Add to [.env.example](./.env.example)
+4. Use it in your Goal definition, see below.
+
+## Adding a Goal
+1. Create a new Python file in the `/goals/` directory (e.g., `goals/my_category.py`) - these files contain descriptions of goals and the tools used to achieve them
+2. Pick a name for your goal! (such as "goal_hr_schedule_pto")
+3. Fill out the required elements:
+-  `id`: needs to be the same as the name
+- `agent_name`: user-facing name for the agent/chatbot
+- `category_tag`: category for the goal
+- `agent_friendly_description`: user-facing description of what the agent/chatbot does
+- `tools`: the list of **native tools** the goal uses. These are defined in [tools/tool_registry.py](tools/tool_registry.py) as `tool_registry.[name_of_tool]`
+
+Example:
+```python
+tools=[
+    tool_registry.current_pto_tool,
+    tool_registry.future_pto_calc_tool,
+    tool_registry.book_pto_tool,
+]
+```
+- `mcp_server_definition`: (Optional) MCP server configuration for external tools. Can use predefined configurations from `shared/mcp_config.py` or define custom ones. See [MCP Tools section](#adding-mcp-tools) below.
+- `description`: LLM-facing description of the goal that lists all tools (native and MCP) by name and purpose.
+- `starter_prompt`: LLM-facing first prompt given to begin the scenario. This field can contain instructions that are different from other goals, like "begin by providing the output of the first tool" rather than waiting on user confirmation. (See [goal_choose_agent_type](tools/goal_registry.py) for an example.)
+- `example_conversation_history`: LLM-facing sample conversation/interaction regarding the goal. See the existing goals for how to structure this.
+4. Add your new goal to a list variable (e.g., `my_category_goals: List[AgentGoal] = [your_super_sweet_new_goal]`)
+5. Import and extend the goal list in `goals/__init__.py` by adding:
+   - Import: `from goals.my_category import my_category_goals`
+   - Extend: `goal_list.extend(my_category_goals)`
+
+## Adding Native Tools
+
+Native tools are custom implementations that run directly in your codebase. Use these for business logic specific to your application.
+
+### Note on Optional Tools
+Tools can be optional - you can indicate this in the tool listing of goal description (see above section re: goal registry) by adding something like, "This step is optional and can be skipped by moving to the next tool." Here is an example from an older iteration of the `goal_hr_schedule_pto` goal, when it was going to have an optional step to check for existing calendar conflicts:
+
+```
+description="Help the user gather args for these tools in order: "
+    "1. CurrentPTO: Tell the user how much PTO they currently have "
+    "2. FuturePTO: Tell the user how much PTO they will have as of the prospective date "
+    "3. CalendarConflict: Tell the user what conflicts if any exist around the prospective date on a list of calendars. This step is optional and can be skipped by moving to the next tool. "
+    "4. BookPTO: Book PTO "
+```
+
+Tools should generally return meaningful information and be generally ‘failsafe’ in returning a useful result based on input.
+(If you're doing a local data approach like those in [.tools/data/](./tools/data/)) it's good to document how they can be setup to get a good result in tool specific [setup](./setup.md).
+
+### Add to Tool Registry
+1.  Open [/tools/tool_registry.py](tools/tool_registry.py) - this file contains mapping of tool names to tool definitions (so the AI understands how to use them)
+2. Define the tool
+- `name`: name of the tool - this is the name as defined in the goal description list of tools. The name should be (sort of) the same as the tool name given in the goal description. So, if the description lists "CurrentPTO" as a tool, the name here should be `current_pto_tool`.
+- `description`: LLM-facing description of tool
+- `arguments`: These are the _input_ arguments to the tool. Each input argument should be defined as a [ToolArgument](./models/tool_definitions.py). Tools don't have to have arguments but the arguments list has to be declared. If the tool you're creating doesn't have inputs, define arguments as `arguments=[]`
+
+### Create Each Native Tool Implementation
+- The tools themselves are defined in their own files in `/tools` - you can add a subfolder to organize them, see the hr tools for an example.
+- The file name and function name will be the same as each other and should also be the same as the name of the tool, without "tool" - so `current_pto_tool` would be `current_pto.py` with a function named `current_pto` within it.
+- The function should have `args: dict` as the input and also return a `dict`
+- The return dict should match the output format you specified in the goal's `example_conversation_history`
+- tools are where the user input+model output becomes deterministic. Add validation here to make sure what the system is doing is valid and acceptable
+
+### Add to `tools/__init__.py` and the tool get_handler()
+- In [tools/__init__.py](./tools/__init__.py), add an import statement for each new native tool as well as an applicable return statement in `get_handler`. The tool name here should match the tool name as described in the goal's `description` field.
+Example:
+```python
+if tool_name == "CurrentPTO":
+    return current_pto
+```
+
+### Update workflow_helpers.py
+- Add your new native tool to the static tools list in [workflows/workflow_helpers.py](workflows/workflow_helpers.py) so it's correctly identified as a native tool rather than an MCP tool.
+
+## Adding MCP Tools
+
+MCP (Model Context Protocol) tools are external tools provided by MCP servers. They're useful for integrating with third-party services like Stripe, databases, or APIs without implementing custom code.
+
+### Configure MCP Server Definition
+You can either use predefined MCP server configurations from `shared/mcp_config.py` or define custom ones. 
+
+#### Using Predefined Configurations
+```python
+from shared.mcp_config import get_stripe_mcp_server_definition
+
+# In your goal definition:
+mcp_server_definition=get_stripe_mcp_server_definition(included_tools=["list_products", "create_customer"])
+```
+
+#### Custom MCP Server Definition
+Add an `mcp_server_definition` to your goal:
+
+```python
+mcp_server_definition=MCPServerDefinition(
+    name="stripe-mcp",
+    command="npx",
+    args=[
+        "-y",
+        "@stripe/mcp",
+        "--tools=all",
+        f"--api-key={os.getenv('STRIPE_API_KEY')}",
+    ],
+    env=None,
+    included_tools=[
+        "list_products",
+        "list_prices", 
+        "create_customer",
+        "create_invoice",
+        "create_payment_link",
+    ],
+)
+```
+
+### MCP Tool Configuration
+- `name`: Identifier for the MCP server
+- `command`: Command to start the MCP server (e.g., "npx", "python")
+- `args`: Arguments to pass to the command
+- `env`: Environment variables for the server (optional)
+- `included_tools`: List of specific tools to use from the server (optional - if omitted, all tools are included)
+
+### How MCP Tools Work
+- MCP tools are automatically loaded when the workflow starts
+- They're dynamically converted to `ToolDefinition` objects
+- The system automatically routes MCP tool calls to the appropriate MCP server
+- No additional code implementation needed - just configuration
+
+## Tool Confirmation
+There are three ways to manage confirmation of tool runs:
+1. Arguments confirmation box - confirm tool arguments and execution with a button click
+   -  Can be disabled by env setting: `SHOW_CONFIRM=FALSE`
+2. Soft prompt confirmation via asking the model to prompt for confirmation: “Are you ready to be invoiced for the total cost of the train tickets?” in the [goal_registry](./tools/goal_registry.py).
+3. Hard confirmation requirement as a tool argument. See for example the PTO Scheduling Tool:
+```Python
+        ToolArgument(
+            name="userConfirmation",
+            type="string",
+            description="Indication of user's desire to book PTO",
+        ),
+```
+If you really want to wait for user confirmation, record it on the workflow (as a Signal) and not rely on the LLM to probably get it, use option #3. 
+I recommend exploring all three. For a demo, I would decide if you want the Arguments confirmation in the UI, and if not I'd generally go with option #2 but use #3 for tools that make business sense to confirm, e.g. those tools that take action/write data.
+
+## Add a Goal & Tools Checklist
+
+### For All Goals:
+- [ ] Create goal file in `/goals/` directory (e.g., `goals/my_category.py`)
+- [ ] Add goal to the category's goal list in the file
+- [ ] Import and extend the goal list in `goals/__init__.py`
+- [ ] If a new category, add Goal Category to [.env](./.env) and [.env.example](./.env.example)
+
+### For Native Tools:
+- [ ] Add native tools to [tool_registry.py](tools/tool_registry.py)
+- [ ] Implement tool functions in `/tools/` directory
+- [ ] Add tools to [tools/__init__.py](tools/__init__.py) in the `get_handler()` function
+- [ ] Add tool names to static tools list in [workflows/workflow_helpers.py](workflows/workflow_helpers.py)
+
+### For MCP Tools:
+- [ ] Add `mcp_server_definition` to your goal configuration (use `shared/mcp_config.py` for common servers)
+- [ ] Ensure MCP server is available and properly configured
+- [ ] Set required environment variables (API keys, etc.)
+- [ ] Test MCP server connectivity before running the agent
+- [ ] If creating new MCP server configs, add them to `shared/mcp_config.py` for reuse
+
+And that's it! Happy AI Agent building!
--- a/docs/architecture-decisions.md
+++ b/docs/architecture-decisions.md
@@ -0,0 +1,33 @@
+# Architecture Decisions
+This documents some of the "why" behind the [architecture](./architecture.md). 
+
+## AI Models
+We wanted to have flexibility to use different models, because this space is changing rapidly and models get better regularly.
+Also, for you, we wanted to let you pick your model of choice. The system is designed to make changing models out simple. For how to do that, checkout the [setup guide](./setup.md).
+
+## Temporal
+We asked one of the AI models used in this demo to answer this question (edited minorly):
+
+### Reliability and State Management:
+ Temporal ensures durability and fault tolerance, which are critical for agentic AI systems that involve long-running, complex workflows. For example, it preserves application state across failures, allowing AI agents to resume from where they left off without losing progress. Major AI companies use this for research experiments and agentic flows, where reliability is essential for continuous exploration.
+### Handling Complex, Dynamic Workflows: 
+Agentic AI often involves unpredictable, multi-step processes like web crawling or data searching. Temporal’s workflow orchestration simplifies managing these tasks by abstracting complexity, providing features like retries, timeouts, and signals/queries. Temporal makes observability and resuming failed complex experiments and deep searches simple.
+### Scalability and Speed: 
+Temporal enables rapid development and scaling, crucial for AI systems handling large-scale experiments or production workloads. AI model deployment and SRE teams use it to get code to production quickly with scale as a focus, while research teams can (and do!) run hundreds of experiments daily. Temporal customers report a significant reduction in development time (e.g., 20 weeks to 2 weeks for a feature).
+### Observability and Debugging: 
+Agentic AI systems need insight into where processes succeed or fail. Temporal provides end-to-end visibility and durable workflow history, which Temporal customers are using to track agentic flows and understand failure points.
+### Simplified Error Handling: 
+Temporal abstracts failure management (e.g., retries, rollbacks) so developers can focus on AI logic rather than "plumbing" code. This is vital for agentic AI, where external interactions (e.g., APIs, data sources) are prone to failure.
+### Flexibility for Experimentation: 
+For research-heavy agentic AI, Temporal supports dynamic, code-first workflows and easy integration of new signals/queries, aligning with researchers needs to iterate quickly on experimental paths.
+
+In essence, Temporal’s value lies in its ability to make agentic AI systems more reliable, scalable, and easier to develop by handling the underlying complexity of distributed workflows for both research and applied AI tasks.
+
+Temporal was built to solve the problems of distributed computing, including scalability, reliability, security, visibility, and complexity. Agentic AI systems are complex distributed systems, so Temporal should fit well. Scaling, security, and productionalization are major pain points in March 2025 for building agentic systems.
+
+In this system Temporal lets you:
+- Orchestrate interactions across distributed data stores and tools <br />
+- Hold state, potentially over long periods of time <br />
+- Ability to ‘self-heal’ and retry until the (probabilistic) LLM returns valid data <br />
+- Support for human intervention such as approvals <br />
+- Parallel processing for efficiency of data retrieval and tool use <br />
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -0,0 +1,74 @@
+# Elements
+These are the main elements of this system. See [architecture decisions](./architecture-decisions.md) for information beind these choices.
+In this document we will explain each element and their interactions, and then connect them all at the end.
+<img src="./assets/Architecture_elements.png" width="50%" alt="Architecture Elements">
+
+## Workflow 
+This is a [Temporal Workflow](https://docs.temporal.io/workflows) - a durable straightforward description of the process to be executed. See [agent_goal_workflow.py](./workflows/agent_goal_workflow.py).
+Temporal is used to make the process scalable, durable, reliable, secure, and visible.
+
+### Workflow Responsibilities:
+- Orchestrates interactive loops:
+    - LLM Loop: Prompts LLM, durably executes LLM, stores responses
+    - Interactive Loop: Elicits responses from input (in our case a human) and validates input responses
+    - Tool Execution Loop: Durably executes Tools
+- Keeps record of all interactions ([Signals, Queries, Updates](https://docs.temporal.io/develop/python/message-passing))
+- Handles failures gracefully
+- Input, LLM and Tool interaction history stored for debugging and analysis
+
+## Activities
+These are [Temporal Activities](https://docs.temporal.io/activities). Defined as simple functions, they are auto-retried async/event driven behind the scenes. Activities durably execute Tools and the LLM. See [a sample activity](./activities/tool_activities.py).
+
+## Tools 
+Tools define the capabilities of the system. They are simple Python functions (could be in any language as Temporal supports multiple languages).
+They are executed by Temporal Activities. They are “just code” - can connect to any API or system. They also are where the deterministic business logic is: you can validate and retry actions using code you write.
+Failures are handled gracefully by Temporal.
+
+Activities + Tools turn the probabalistic input from the user and LLM into deterministic action.
+
+## Prompts
+Prompts are where the instructions to the LLM are. Prompts are made up of initial instructions, goal instructions, and tool instructions. 
+See [agent prompts](./prompts/agent_prompt_generators.py) and [goal & tool prompts](./tools/goal_registry.py). 
+
+This is where you can add probabalistic business logic to
+- to control process flow
+- describe what to do
+- give examples of interactions
+- give instruction and validation for the LLM
+
+## LLM
+Probabalistic execution: it will _probably_ do what you tell it to do.
+Turns the guidance from the prompts (see [agent prompts](./prompts/agent_prompt_generators.py) and [goal prompts](./tools/goal_registry.py)) into 
+You have a choice of providers - see [setup](./setup.md). 
+The LLM:
+- Drives toward the initial Goal and any subsequent Goals selected by user
+- Decides what to do based on input, such as:
+    - Validates user input for Tools
+    - Decides when to execute Tools
+    - Decides on next step for Goal
+- Formats input and interprets output for Tools
+- is executed by Temporal Activities
+    - API failures and logical failures are handled transparently
+
+## Interaction
+Interaction is managed with Temporal Signals and Queries. These are durably stored in Workflow History. 
+History can be used for analysis and debugging. It's all “just code” so it's easy to add new Signals and Queries. 
+Input can be very dynamic, just needs to be serializable.
+
+The Workflow executes the Interaction Loop: gathering input, validating input, and providing a response:
+
+![Interaction Loop](./assets/interaction_loop.png)
+
+Here's a more detailed example for gathering inputs for Tools:
+
+![Tool Gathering](./assets/argument_gathering_cycle.png)
+
+# Architecture Model
+Now that we have the pieces and what they do, here is a more complete diagram of how the pieces work together: 
+
+
+![Architecture](./assets/ai_agent_architecture_model.png "Architecture Model")
+
+
+# Adding features
+Want to add more Goals and Tools? See [adding goals and tools](./adding-goals-and-tools.md). Have fun!
--- a/docs/assets/Architecture_elements.png
+++ b/docs/assets/Architecture_elements.png
--- a/docs/assets/ai_agent_architecture_model.png
+++ b/docs/assets/ai_agent_architecture_model.png
--- a/docs/assets/argument_gathering_cycle.png
+++ b/docs/assets/argument_gathering_cycle.png
--- a/docs/assets/interaction_loop.png
+++ b/docs/assets/interaction_loop.png
--- a/docs/changelog.md
+++ b/docs/changelog.md
@@ -0,0 +1,30 @@
+# Changelog
+
+All notable changes to this project will be documented in this file.
+
+## [0.2.0] - 2025-04-24
+
+![0.2.0 Changes Screenshot](./assets/0.2.0_changes.jpeg)
+
+### Added
+- **Multi‑goal agent architecture** with dynamic goal switching (`goal_choose_agent_type`, `ListAgents`, `ChangeGoal`).
+    - See [the architecture guide](./architecture.md) and [setup guide](./setup.md).
+- **New goal categories & agents**: HR PTO scheduling/checking, paycheck integration, Financial (balances, money movement, loan application), E‑commerce order tracking.
+    - See [the guide for adding goals and tools](./adding-goals-and-tools.md).
+- **Force Confirmation**: `SHOW_CONFIRM` will show a confirmation box before allowing the agent to run a tool.
+- **Grok (`x.ai`) LLM provider** support via `GROK_API_KEY`.
+- Extensive **docs**: `setup.md`, `architecture.md`, `architecture-decisions.md`, `adding-goals-and-tools.md`, plus new diagrams & assets.
+
+### Changed
+- **UI Confirmation Box** is less 'debug' looking and prettier.
+- Package renamed to **`temporal_AI_agent`** and version bumped to **0.2.0** in `pyproject.toml`.
+- Environment variables changed (see `.env_example`): (`RAPIDAPI_HOST_*`, `AGENT_GOAL` defaults, `GOAL_CATEGORIES`, `SHOW_CONFIRM`, `FIN_START_REAL_WORKFLOW`).
+
+## [0.1.0] - 2025-01-04
+
+### Added
+- **Initial release** of the Temporal AI Agent demo.
+- **Single goal agent** architecture with a single goal and agent type.
+    - This is the agent demoed in the [YouTube video](https://www.youtube.com/watch?v=GEXllEH2XiQ).
+
+[0.2.0]: https://github.com/temporal-community/temporal-ai-agent/pull/29
--- a/docs/contributing.md
+++ b/docs/contributing.md
@@ -0,0 +1,106 @@
+# Contributing to the Temporal AI Agent Project
+
+This document provides guidelines for contributing to `temporal-ai-agent`. All setup and installation instructions can be found in [setup.md](./setup.md).
+
+## Getting Started
+
+### Code Style & Formatting
+We use `black` for code formatting and `isort` for import sorting to maintain a consistent codebase.
+-   **Format code:**
+    ```bash
+    poetry run poe format
+    ```
+    Or manually:
+    ```bash
+    poetry run black .
+    poetry run isort .
+    ```
+    Please format your code before committing.
+
+### Linting & Type Checking
+We use `mypy` for static type checking and other linters configured via `poe the poet`.
+-   **Run linters and type checks:**
+    ```bash
+    poetry run poe lint
+    ```
+    Or manually for type checking:
+    ```bash
+    poetry run mypy --check-untyped-defs --namespace-packages .
+    ```
+    Ensure all linting and type checks pass before submitting a pull request.
+
+## Testing
+Comprehensive testing is crucial for this project. We use `pytest` and Temporal's testing framework.
+-   **Install test dependencies** (if not already done with `poetry install --with dev`):
+    ```bash
+    poetry install --with dev
+    ```
+-   **Run all tests:**
+    ```bash
+    poetry run pytest
+    ```
+-   **Run tests with time-skipping (recommended for faster execution, especially in CI):**
+    ```bash
+    poetry run pytest --workflow-environment=time-skipping
+    ```
+
+For detailed information on test categories, running specific tests, test environments, coverage, and troubleshooting, please refer to:
+-   [testing.md](./testing.md) (Quick Start and overview)
+-   [tests/README.md](../tests/README.md) (Comprehensive guide, patterns, and best practices)
+
+**Ensure all tests pass before submitting a pull request.**
+
+## Making Changes
+
+### Adding New Tools or Goals
+If you're looking to extend the agent's capabilities:
+1.  Create your tool implementation in the `tools/` directory.
+2.  Register your tool and associate it with relevant goals.
+For detailed instructions, please see:
+-   [Agent Customization in AGENTS.md](../AGENTS.md#agent-customization)
+-   [Adding Goals and Tools Guide](./adding-goals-and-tools.md)
+
+### General Code Changes
+-   Follow the existing code style and patterns.
+-   Ensure any new code is well-documented with comments.
+-   Write new tests for new functionality or bug fixes.
+-   Update existing tests if necessary.
+
+## Submitting Contributions
+
+### Pull Requests
+When you're ready to submit your changes:
+1.  Push your branch to the remote repository.
+2.  Open a Pull Request (PR) against the `main` branch.
+3.  **Describe your changes:** Clearly explain what you changed and why. Reference any related issues.
+4.  **Ensure tests pass:** All CI checks, including tests and linters, must pass. The command `poetry run pytest --workflow-environment=time-skipping` is a good one to run locally.
+5.  **Request review:** Request a review from one or more maintainers.
+
+## Reporting Bugs
+If you encounter a bug, please:
+1.  **Search existing issues:** Check if the bug has already been reported.
+2.  **Open a new issue:** If not, create a new issue.
+    -   Provide a clear and descriptive title.
+    -   Include steps to reproduce the bug.
+    -   Describe the expected behavior and what actually happened.
+    -   Provide details about your environment (OS, Python version, Temporal server version, etc.).
+    -   Include any relevant logs or screenshots.
+
+## Suggesting Enhancements
+We welcome suggestions for new features or improvements!
+1.  **Search existing issues/discussions:** See if your idea has already been discussed.
+2.  **Open a new issue:**
+    -   Use a clear and descriptive title.
+    -   Provide a detailed explanation of the enhancement and its benefits.
+    -   Explain the use case or problem it solves.
+    -   Include any potential implementation ideas if you have them.
+
+## Key Resources
+-   **Project Overview**: [README.md](../README.md)
+-   **Detailed Contribution & Development Guide**: [AGENTS.md](../AGENTS.md)
+-   **Setup Instructions**: [setup.md](./setup.md)
+-   **Comprehensive Testing Guide**: [testing.md](./testing.md) and [tests/README.md](../tests/README.md)
+-   **System Architecture**: [architecture.md](./architecture.md)
+-   **Architecture Decisions**: [architecture-decisions.md](./architecture-decisions.md)
+-   **Customizing Agent Tools and Goals**: [adding-goals-and-tools.md](./adding-goals-and-tools.md)
+-   **To-Do List / Future Enhancements**: [todo.md](./todo.md)
--- a/docs/setup.md
+++ b/docs/setup.md
@@ -0,0 +1,336 @@
+# Setup Guide
+## Initial Configuration
+
+This application uses `.env` files for configuration. Copy the [.env.example](.env.example) file to `.env` and update the values:
+
+```bash
+cp .env.example .env
+```
+
+Then add API keys, configuration, as desired.
+
+If you want to show confirmations/enable the debugging UI that shows tool args, set
+```bash
+SHOW_CONFIRM=True
+```
+We recommend setting this to `False` in most cases, as it can clutter the conversation with confirmation messages.
+
+### Quick Start with Makefile
+
+We've provided a Makefile to simplify the setup and running of the application. Here are the main commands:
+
+```bash
+# Initial setup
+make setup              # Creates virtual environment and installs dependencies
+make setup-venv         # Creates virtual environment only
+make install            # Installs all dependencies
+
+# Running the application
+make run-worker         # Starts the Temporal worker
+make run-api            # Starts the API server
+make run-frontend       # Starts the frontend development server
+
+# Additional services
+make run-train-api      # Starts the train API server
+make run-legacy-worker  # Starts the legacy worker
+make run-enterprise     # Builds and runs the enterprise .NET worker
+
+# Development environment setup
+make setup-temporal-mac # Installs and starts Temporal server on Mac
+
+# View all available commands
+make help
+```
+
+### Manual Setup (Alternative to Makefile)
+
+If you prefer to run commands manually, see the sections below for detailed instructions on setting up the backend, frontend, and other components.
+
+### Agent Goal Configuration
+
+The agent can be configured to pursue different goals using the `AGENT_GOAL` environment variable in your `.env` file. 
+
+**Single Agent Mode (Default)**
+By default, the agent operates in single-agent mode using a specific goal. If unset, the default is `goal_event_flight_invoice`.
+
+To set a specific single goal:
+```bash
+AGENT_GOAL=goal_event_flight_invoice
+```
+
+**Multi-Agent Mode (Experimental)**
+The agent also supports an experimental multi-agent mode where users can choose between different agent types during the conversation. To enable this mode:
+
+```bash
+AGENT_GOAL=goal_choose_agent_type
+```
+
+When using multi-agent mode, you can control which agent categories are available using `GOAL_CATEGORIES` in your `.env` file. If unset, all categories are shown. Available categories include `hr`, `travel-flights`, `travel-trains`, `fin`, `ecommerce`, `mcp-integrations`, and `food`.
+We recommend starting with `fin`:
+```bash
+GOAL_CATEGORIES=hr,travel-flights,travel-trains,fin
+```
+
+**Note:** Multi-agent mode is experimental and allows switching between different agents mid-conversation, but single-agent mode provides a more focused experience.
+
+MCP (Model Context Protocol) tools are available for enhanced integration with external services. See the [MCP Tools Configuration](#mcp-tools-configuration) section for setup details.
+
+See the section Goal-Specific Tool Configuration below for tool configuration for specific goals.
+
+### LLM Configuration
+
+Note: We recommend using OpenAI's GPT-4o or Claude 3.5 Sonnet for the best results. There can be significant differences in performance and capabilities between models, especially for complex tasks.
+
+The agent uses LiteLLM to interact with various LLM providers. Configure the following environment variables in your `.env` file:
+
+- `LLM_MODEL`: The model to use (e.g., "openai/gpt-4o", "anthropic/claude-3-sonnet", "google/gemini-pro", etc.)
+- `LLM_KEY`: Your API key for the selected provider
+- `LLM_BASE_URL`: (Optional) Custom base URL for the LLM provider. Useful for:
+  - Using Ollama with a custom endpoint
+  - Using a proxy or custom API gateway
+  - Testing with different API versions
+
+LiteLLM will automatically detect the provider based on the model name. For example:
+- For OpenAI models: `openai/gpt-4o` or `openai/gpt-3.5-turbo`
+- For Anthropic models: `anthropic/claude-3-sonnet`
+- For Google models: `google/gemini-pro`
+- For Ollama models: `ollama/mistral` (requires `LLM_BASE_URL` set to your Ollama server)
+
+Example configurations:
+```bash
+# For OpenAI
+LLM_MODEL=openai/gpt-4o
+LLM_KEY=your-api-key-here
+
+# For Anthropic
+LLM_MODEL=anthropic/claude-3-sonnet
+LLM_KEY=your-api-key-here
+
+# For Ollama with custom URL
+LLM_MODEL=ollama/mistral
+LLM_BASE_URL=http://localhost:11434
+```
+
+For a complete list of supported models and providers, visit the [LiteLLM documentation](https://docs.litellm.ai/docs/providers).
+
+## Configuring Temporal Connection
+
+By default, this application will connect to a local Temporal server (`localhost:7233`) in the default namespace, using the `agent-task-queue` task queue. You can override these settings in your `.env` file.
+
+### Use Temporal Cloud
+
+See [.env.example](.env.example) for details on connecting to Temporal Cloud using mTLS or API key authentication.
+
+[Sign up for Temporal Cloud](https://temporal.io/get-cloud)
+
+### Use a local Temporal Dev Server
+
+On a Mac
+```bash
+brew install temporal
+temporal server start-dev
+```
+See the [Temporal documentation](https://learn.temporal.io/getting_started/python/dev_environment/) for other platforms.
+
+You can also run a local Temporal server using Docker Compose. See the `Development with Docker` section below.
+
+## Running the Application
+
+### Docker
+- All services are defined in `docker-compose.yml` (includes a Temporal server).
+- **Dev overrides** (mounted code, live‑reload commands) live in `docker-compose.override.yml` and are **auto‑merged** on `docker compose up`.
+- To start **development** mode (with hot‑reload):
+  ```bash
+  docker compose up -d
+  # quick rebuild without infra:
+  docker compose up -d --no-deps --build api train-api worker frontend
+  ```
+- To run **production** mode (ignore dev overrides):
+  ```bash
+  docker compose -f docker-compose.yml up -d
+  ```
+
+Default urls:
+* Temporal UI: [http://localhost:8080](http://localhost:8080)
+* API: [http://localhost:8000](http://localhost:8000)
+* Frontend: [http://localhost:5173](http://localhost:5173)
+
+### Local Machine (no docker)
+
+**Python Backend**
+
+Requires [Poetry](https://python-poetry.org/) to manage dependencies.
+
+1. `python -m venv venv`
+
+2. `source venv/bin/activate`
+
+3. `poetry install`
+
+Run the following commands in separate terminal windows:
+
+1. Start the Temporal worker:
+```bash
+poetry run python scripts/run_worker.py
+```
+
+2. Start the API server:
+```bash
+poetry run uvicorn api.main:app --reload
+```
+Access the API at `/docs` to see the available endpoints.
+
+**React UI**
+Start the frontend:
+```bash
+cd frontend
+npm install
+npx vite
+```
+Access the UI at `http://localhost:5173`
+
+
+## MCP Tools Configuration
+
+MCP (Model Context Protocol) tools enable integration with external services without custom implementation. The system automatically handles MCP server lifecycle and tool discovery.
+
+### Adding MCP Tools to Goals
+Configure MCP servers in your goal definitions using either:
+1. Predefined configurations from `shared/mcp_config.py`
+2. Custom `MCPServerDefinition` objects
+
+Example using Stripe MCP Server:
+```python
+from shared.mcp_config import get_stripe_mcp_server_definition
+
+mcp_server_definition=get_stripe_mcp_server_definition(
+    included_tools=["list_products", "create_customer", "create_invoice"]
+)
+```
+
+See the file `goals/stripe_mcp.py` for an example of how to use MCP tools in a an `AgentGoal`.
+
+### MCP Environment Variables
+Set required API keys and configuration in your `.env` file:
+```bash
+# For Stripe MCP Server
+STRIPE_API_KEY=sk_test_your_stripe_key_here
+```
+`goal_event_flight_invoice` does not require a Stripe key. If `STRIPE_API_KEY` is unset, that scenario falls back to a mock invoice.
+
+#### Accessing Your Test API Keys
+It's free to sign up for a Stripe account and generate test keys (no real money is involved). Use the Developers Dashboard to create, reveal, delete, and rotate API keys. Navigate to the API Keys tab in your dashboard or visit [https://dashboard.stripe.com/test/apikeys](https://dashboard.stripe.com/test/apikeys) directly.
+
+For detailed guidance on adding MCP tools, see [adding-goals-and-tools.md](./adding-goals-and-tools.md).
+
+## Goal-Specific Tool Configuration
+Here is configuration guidance for specific goals. Travel and financial goals have configuration & setup as below.
+### Goal: Find an event in Australia / New Zealand, book flights to it and invoice the user for the cost
+- `AGENT_GOAL=goal_event_flight_invoice` - Helps users find events, book flights, and arrange train travel with invoice generation
+    - This is the scenario in the [original video](https://www.youtube.com/watch?v=GEXllEH2XiQ)
+
+#### Configuring Agent Goal: goal_event_flight_invoice
+* The agent uses a mock function to search for events. This has zero configuration.
+* **Flight Search**: The agent intelligently handles flight searches:
+    * **Default behavior**: If no `RAPIDAPI_KEY` is set, the agent generates realistic flight data with smart pricing based on route type (domestic, international, trans-Pacific)
+    * **Real API (optional)**: To use live flight data, set `RAPIDAPI_KEY` in your `.env` file
+        * It's free to sign up at [RapidAPI](https://rapidapi.com/apiheya/api/sky-scrapper)
+        * This API might be slow to respond, so you may want to increase the start to close timeout, `TOOL_ACTIVITY_START_TO_CLOSE_TIMEOUT` in `workflows/workflow_helpers.py`
+    * The smart generation creates realistic pricing (e.g., US-Australia routes $1200-1800, domestic flights $200-800) with appropriate airlines for each region
+* Requires a Stripe key for the `create_invoice` tool. Set this in the `STRIPE_API_KEY` environment variable in `.env`
+* It's free to sign up and get a key at [Stripe](https://stripe.com/) (test mode only, no real money)
+        * Set permissions for read-write on: `Credit Notes, Invoices, Customers and Customer Sessions`
+* If you don't have a Stripe key, comment out the `STRIPE_API_KEY` in the `.env` file, and a dummy invoice will be created rather than a Stripe invoice. The function can be found in `tools/create_invoice.py` – this is the default behavior for `goal_event_flight_invoice`.
+
+### Goal: Find a Premier League match, book train tickets to it and invoice the user for the cost (Replay 2025 Keynote)
+- `AGENT_GOAL=goal_match_train_invoice` - Focuses on Premier League match attendance with train booking and invoice generation
+    - This goal was part of [Temporal's Replay 2025 conference keynote demo](https://www.youtube.com/watch?v=YDxAWrIBQNE)
+    - Note, there is failure built in to this demo (the train booking step) to show how the agent can handle failures and retry. See Tool Configuration below for details.
+#### Configuring Agent Goal: goal_match_train_invoice
+NOTE: This goal was developed for an on-stage demo and has failure (and its resolution) built in to show how the agent can handle failures and retry.
+* Omit `FOOTBALL_DATA_API_KEY` from .env for the `SearchFixtures` tool to automatically return mock Premier League fixtures. Finding a real match requires a key from [Football Data](https://www.football-data.org). Sign up for a free account, then see the 'My Account' page to get your API token.
+* We use a mock function to search for trains. Start the train API server to use the real API: `python thirdparty/train_api.py`
+* * The train activity is 'enterprise' so it's written in C# and requires a .NET runtime. See the [.NET backend](#net-(enterprise)-backend) section for details on running it.
+* Requires a Stripe key for the `create_invoice` tool. Set this in the `STRIPE_API_KEY` environment variable in `.env`
+    * It's free to sign up and get a key at [Stripe](https://stripe.com/) (test mode only)
+    * If the key is missing this goal won't generate a real invoice – only `goal_event_flight_invoice` falls back to a mock invoice
+    * If you're lazy go to `tools/create_invoice.py` and replace the `create_invoice` function with the mock `create_invoice_example` that exists in the same file.
+
+##### Python Search Trains API
+> Agent Goal: goal_match_train_invoice only
+
+Required to search and book trains!
+```bash
+poetry run python thirdparty/train_api.py
+
+# example url
+# http://localhost:8080/api/search?from=london&to=liverpool&outbound_time=2025-04-18T09:00:00&inbound_time=2025-04-20T09:00:00
+```
+
+ ##### Python Train Legacy Worker
+ > Agent Goal: goal_match_train_invoice only
+
+ These are Python activities that fail (raise NotImplemented) to show how Temporal handles a failure. You can run these activities with.
+
+ ```bash
+ poetry run python scripts/run_legacy_worker.py
+ ```
+
+ The activity will fail and be retried infinitely. To rescue the activity (and its corresponding workflows), kill the worker and run the .NET one in the section below.
+
+ ##### .NET (enterprise) Worker ;)
+We have activities written in C# to call the train APIs.
+```bash
+cd enterprise
+dotnet build # ensure you brew install dotnet@8 first!
+dotnet run
+```
+If you're running your train API above on a different host/port then change the API URL in `Program.cs`. Otherwise, be sure to run it using `python thirdparty/train_api.py`.
+
+#### Goals: FIN - Money Movement and Loan Application
+Make sure you have the mock users you want (such as yourself) in [the account mock data file](./tools/data/customer_account_data.json).
+
+- `AGENT_GOAL=goal_fin_move_money` - This scenario _can_ initiate a secondary workflow to move money. Check out [this repo](https://github.com/temporal-sa/temporal-money-transfer-java) - you'll need to get the worker running and connected to the same account as the agentic worker.
+By default it will _not_ make a real workflow, it'll just fake it. If you get the worker running and want to start a workflow, in your [.env](./.env):
+```bash
+FIN_START_REAL_WORKFLOW=FALSE #set this to true to start a real workflow
+```
+- `AGENT_GOAL=goal_fin_loan_application` - This scenario _can_ initiate a secondary workflow to apply for a loan. Check out [this repo](https://github.com/temporal-sa/temporal-latency-optimization-scenarios) - you'll need to get the worker running and connected to the same account as the agentic worker.
+By default it will _not_ make a real workflow, it'll just fake it. If you get the worker running and want to start a workflow, in your [.env](./.env):
+```bash
+FIN_START_REAL_WORKFLOW=FALSE #set this to true to start a real workflow
+```
+
+#### Goals: HR/PTO
+Make sure you have the mock users you want in (such as yourself) in [the PTO mock data file](./tools/data/employee_pto_data.json).
+
+#### Goals: Ecommerce
+Make sure you have the mock orders you want in (such as those with real tracking numbers) in [the mock orders file](./tools/data/customer_order_data.json).
+
+### Goal: Food Ordering with MCP Integration (Stripe Payment Processing)
+- `AGENT_GOAL=goal_food_ordering` - Demonstrates food ordering with Stripe payment processing via MCP
+    - Uses Stripe's MCP Server ([Agent Toolkit](https://github.com/stripe/agent-toolkit/tree/main/modelcontextprotocol)) for payment operations
+    - Requires `STRIPE_API_KEY` in your `.env` file
+    - Requires products in Stripe with metadata key `use_case=food_ordering_demo`. Run `tools/food/setup/create_stripe_products.py` to set up pizza menu items
+    - Example of MCP tool integration without custom implementation
+    - This is an excellent demonstration of MCP (Model Context Protocol) capabilities
+
+
+## Customizing the Agent Further
+- `tool_registry.py` contains the mapping of tool names to tool definitions (so the AI understands how to use them)
+- `goals/` contains descriptions of goals and the tools used to achieve them
+- The tools themselves are defined in their own files in `/tools`
+
+For more details, check out [adding goals and tools guide](./adding-goals-and-tools.md).
+
+## Setup Checklist
+[  ] copy `.env.example` to `.env` <br />
+[  ] Select an LLM and add your API key to `.env` <br />
+[  ] (Optional) set your starting goal and goal category in  `.env` <br />
+[  ] (Optional) configure your Temporal Cloud settings in  `.env` <br />
+[  ] `poetry run python scripts/run_worker.py` <br />
+[  ] `poetry run uvicorn api.main:app --reload` <br />
+[  ] `cd frontend`, `npm install`, `npx vite` <br />
+[ ] Access the UI at `http://localhost:5173` <br />
+
+And that's it! Happy AI Agent Exploring!
--- a/docs/testing.md
+++ b/docs/testing.md
@@ -0,0 +1,163 @@
+# Testing the Temporal AI Agent
+
+This guide provides instructions for running the comprehensive test suite for the Temporal AI Agent project.
+
+## Quick Start
+
+1. **Install dependencies**:
+   ```bash
+   poetry install --with dev
+   ```
+
+2. **Run all tests**:
+   ```bash
+   poetry run pytest
+   ```
+
+3. **Run with time-skipping for faster execution**:
+   ```bash
+   poetry run pytest --workflow-environment=time-skipping
+   ```
+
+## Test Categories
+
+### Unit Tests
+- **Activity Tests**: `tests/test_tool_activities.py`
+  - LLM integration (mocked)
+  - Environment configuration
+  - JSON processing
+  - Dynamic tool execution
+
+### Integration Tests  
+- **Workflow Tests**: `tests/test_agent_goal_workflow.py`
+  - Full workflow execution
+  - Signal and query handling
+  - State management
+  - Error scenarios
+
+## Running Specific Tests
+
+```bash
+# Run only activity tests
+poetry run pytest tests/test_tool_activities.py -v
+
+# Run only workflow tests  
+poetry run pytest tests/test_agent_goal_workflow.py -v
+
+# Run a specific test
+poetry run pytest tests/test_tool_activities.py::TestToolActivities::test_sanitize_json_response -v
+
+# Run tests matching a pattern
+poetry run pytest -k "validation" -v
+```
+
+## Test Environment Options
+
+### Local Environment (Default)
+```bash
+poetry run pytest --workflow-environment=local
+```
+
+### Time-Skipping Environment (Recommended for CI)
+```bash
+poetry run pytest --workflow-environment=time-skipping
+```
+
+### External Temporal Server
+```bash
+poetry run pytest --workflow-environment=localhost:7233
+```
+
+## Environment Variables
+
+Tests can be configured with these environment variables:
+
+- `LLM_MODEL`: Model for LLM testing (default: "openai/gpt-4")
+- `LLM_KEY`: API key for LLM service (mocked in tests)
+- `LLM_BASE_URL`: Custom LLM endpoint (optional)
+
+## Test Coverage
+
+The test suite covers:
+
+✅ **Workflows**
+- AgentGoalWorkflow initialization and execution
+- Signal handling (user_prompt, confirm, end_chat)
+- Query methods (conversation history, agent goal, tool data)
+- State management and conversation flow
+- Validation and error handling
+
+✅ **Activities**  
+- ToolActivities class methods
+- LLM integration (mocked)
+- Environment variable handling
+- JSON response processing
+- Dynamic tool activity execution
+
+✅ **Integration**
+- End-to-end workflow execution
+- Activity registration in workers
+- Temporal client interactions
+
+## Test Output
+
+Successful test run example:
+```
+============================== test session starts ==============================
+platform darwin -- Python 3.11.3, pytest-8.3.5, pluggy-1.5.0
+rootdir: /Users/steveandroulakis/Documents/Code/agentic/temporal-demo/temporal-ai-agent
+configfile: pyproject.toml
+plugins: anyio-4.5.2, asyncio-0.26.0
+collected 21 items
+
+tests/test_tool_activities.py::TestToolActivities::test_sanitize_json_response PASSED
+tests/test_tool_activities.py::TestToolActivities::test_parse_json_response_success PASSED
+tests/test_tool_activities.py::TestToolActivities::test_get_wf_env_vars_default_values PASSED
+...
+
+============================== 21 passed in 12.5s ==============================
+```
+
+## Troubleshooting
+
+### Common Issues
+
+1. **Module not found errors**: Run `poetry install --with dev`
+2. **Async warnings**: These are expected with pytest-asyncio and can be ignored  
+3. **Test timeouts**: Use `--workflow-environment=time-skipping` for faster execution
+4. **Import errors**: Check that you're running tests from the project root directory
+
+### Debugging Tests
+
+Enable verbose logging:
+```bash
+poetry run pytest --log-cli-level=DEBUG -s
+```
+
+Run with coverage:
+```bash
+poetry run pytest --cov=workflows --cov=activities
+```
+
+## Continuous Integration
+
+For CI environments, use:
+```bash
+poetry run pytest --workflow-environment=time-skipping --tb=short
+```
+
+## Additional Resources
+
+- See `tests/README.md` for detailed testing documentation
+- Review `tests/conftest.py` for available test fixtures
+- Check individual test files for specific test scenarios
+
+## Test Architecture
+
+The tests use:
+- **Temporal Testing Framework**: For workflow and activity testing
+- **pytest-asyncio**: For async test support  
+- **unittest.mock**: For mocking external dependencies
+- **Test Fixtures**: For consistent test data and setup
+
+All external dependencies (LLM calls, file I/O) are mocked to ensure fast, reliable tests.
--- a/docs/todo.md
+++ b/docs/todo.md
@@ -0,0 +1,37 @@
+# todo list
+
+## General Agent Enhancements
+
+[ ] Google's A2A is emerging as the standard way to hand off agents to other agents. We should examine implementing this soon.
+
+[ ] Custom metrics/tracing is important for AI specific aspects such as number of LLM calls, number of bad LLM responses that require retrying, number of bad chat outcomes. We should add this.
+
+[ ] Evals are very important in agents. We want to be able to 'judge' the agent's performance both in dev and production (AIOps). This will help us improve our agent's performance over time in a targeted fashion.
+
+[ ] Dynamically switch LLMs on persistent failures: <br />
+    - detect failure in the activity using failurecount <br />
+    - activity switches to secondary LLM defined in .env
+    - activity reports switch to workflow
+
+[ ] Collapse history/summarize chat after goal finished <br />
+
+[ ] Write tests<br />
+
+[ ] non-retry the api key error - "Invalid API Key provided: sk_test_**J..." and "AuthenticationError" <br />
+
+[ ] add visual feedback when workflow starting <br />
+
+[ ] enable user to list agents at any time - like end conversation - probably with a next step<br />
+
+## Ideas for more goals and tools
+
+[ ] Add fintech goals <br />
+- Fraud Detection and Prevention - The AI monitors transactions across accounts, flagging suspicious activities (e.g., unusual spending patterns or login attempts) and autonomously freezing accounts or notifying customers and compliance teams.<br />
+- Personalized Financial Advice - An AI agent analyzes a customer’s financial data (e.g., income, spending habits, savings, investments) and provides tailored advice, such as budgeting tips, investment options, or debt repayment strategies.<br />
+- Portfolio Management and Rebalancing - The AI monitors a customer’s investment portfolio, rebalancing it automatically based on market trends, risk tolerance, and financial goals (e.g., shifting assets between stocks, bonds, or crypto).<br />
+
+[ ] new loan/fraud check/update with start <br />
+[ ] financial advise - args being freeform customer input about their financial situation, goals
+    [ ] tool is maybe a new tool asking the LLM to advise
+
+[ ] for demo simulate failure  - add utilities/simulated failures from pipeline demo <br />