diff --git a/README.md b/README.md index 7650d7d..1a2fad2 100644 --- a/README.md +++ b/README.md @@ -9,24 +9,8 @@ It's really helpful to [watch the demo (5 minute YouTube video)](https://www.you [![Watch the demo](./assets/agent-youtube-screenshot.jpeg)](https://www.youtube.com/watch?v=GEXllEH2XiQ) ## Why Temporal? -There are a lot of AI and Agentic AI tools out there, and more on the way. But why Temporal? I asked one of the AI models used in this demo to answer this question (edited minorly): - -### Reliability and State Management: - Temporal ensures durability and fault tolerance, which are critical for agentic AI systems that involve long-running, complex workflows. For example, it preserves application state across failures, allowing AI agents to resume from where they left off without losing progress. Major AI companies use this for research experiments and agentic flows, where reliability is essential for continuous exploration. -### Handling Complex, Dynamic Workflows: -Agentic AI often involves unpredictable, multi-step processes like web crawling or data searching. Temporal’s workflow orchestration simplifies managing these tasks by abstracting complexity, providing features like retries, timeouts, and signals/queries. Temporal makes observability and resuming failed complex experiments and deep searches simple. -### Scalability and Speed: -Temporal enables rapid development and scaling, crucial for AI systems handling large-scale experiments or production workloads. AI model deployment and SRE teams use it to get code to production quickly with scale as a focus, while research teams can (and do!) run hundreds of experiments daily. Temporal customers report a significant reduction in development time (e.g., 20 weeks to 2 weeks for a feature). -### Observability and Debugging: -Agentic AI systems need insight into where processes succeed or fail. Temporal provides end-to-end visibility and durable workflow history, which Temporal customers are using to track agentic flows and understand failure points. -### Simplified Error Handling: -Temporal abstracts failure management (e.g., retries, rollbacks) so developers can focus on AI logic rather than "plumbing" code. This is vital for agentic AI, where external interactions (e.g., APIs, data sources) are prone to failure. -### Flexibility for Experimentation: -For research-heavy agentic AI, Temporal supports dynamic, code-first workflows and easy integration of new signals/queries, aligning with researchers needs to iterate quickly on experimental paths. - -In essence, Temporal’s value lies in its ability to make agentic AI systems more reliable, scalable, and easier to develop by handling the underlying complexity of distributed workflows for both research and applied AI tasks. - -Temporal was built to solve the problems of distributed computing, including scalability, reliability, security, visibility, and complexity. Agentic AI systems are complex distributed systems, so Temporal should fit well. Scaling, security, and productionalization are major pain points in March 2025 for building agentic systems. +There are a lot of AI and Agentic AI tools out there, and more on the way. But why Temporal? Temporal gives this system reliablity, state management, a code-first approach that we really like, built-in observability and easy error handling. +For more, check out [architecture-decisions](./architecture-decisions.md). ## Setup and Configuration See [the Setup guide](./setup.md). diff --git a/adding-goals-and-tools.md b/adding-goals-and-tools.md index 634a27f..ec5a3f3 100644 --- a/adding-goals-and-tools.md +++ b/adding-goals-and-tools.md @@ -69,26 +69,25 @@ if tool_name == "CurrentPTO": return current_pto ``` -TODO probably update this it's out of date :point_down: -### Configuring the Starting Goal +### Existing Travel Goals The agent can be configured to pursue different goals using the `AGENT_GOAL` environment variable in your `.env` file. #### Goal: Find an event in Australia / New Zealand, book flights to it and invoice the user for the cost - `AGENT_GOAL=goal_event_flight_invoice` (default) - Helps users find events, book flights, and arrange train travel with invoice generation - - This is the scenario in the video above + - This is the scenario in the [original video](https://www.youtube.com/watch?v=GEXllEH2XiQ) #### Goal: Find a Premier League match, book train tickets to it and invoice the user for the cost - `AGENT_GOAL=goal_match_train_invoice` - Focuses on Premier League match attendance with train booking and invoice generation - - This is a new goal that is part of an upcoming conference talk + - This is a new goal that is part of the [Replay 2025 talk](https://www.youtube.com/watch?v=YDxAWrIBQNE). -If not specified, the agent defaults to `goal_event_flight_invoice`. Each goal comes with its own set of tools and conversation flows designed for specific use cases. You can examine `tools/goal_registry.py` to see the detailed configuration of each goal. +If not specified, the agent defaults to all goals. Each goal, including these, comes with its own set of tools and conversation flows designed for specific use cases. You can examine `tools/goal_registry.py` to see the detailed configuration of each goal. -See the next section for tool configuration for each goal. +See the next section for tool configuration for these goals. -### Configuring Existing Tools +#### Configuring Travel Goal Tools -#### Agent Goal: goal_event_flight_invoice (default) +##### Agent Goal: goal_event_flight_invoice (default) * The agent uses a mock function to search for events. This has zero configuration. * By default the agent uses a mock function to search for flights. * If you want to use the real flights API, go to `tools/search_flights.py` and replace the `search_flights` function with `search_flights_real_api` that exists in the same file. @@ -98,7 +97,7 @@ See the next section for tool configuration for each goal. * It's free to sign up and get a key at [Stripe](https://stripe.com/) * If you're lazy go to `tools/create_invoice.py` and replace the `create_invoice` function with the mock `create_invoice_example` that exists in the same file. -#### Agent Goal: goal_match_train_invoice +##### Agent Goal: goal_match_train_invoice * Finding a match requires a key from [Football Data](https://www.football-data.org). Sign up for a free account, then see the 'My Account' page to get your API token. Set `FOOTBALL_DATA_API_KEY` to this value. * If you're lazy go to `tools/search_fixtures.py` and replace the `search_fixtures` function with the mock `search_fixtures_example` that exists in the same file. diff --git a/architecture-decisions.md b/architecture-decisions.md new file mode 100644 index 0000000..7c146d1 --- /dev/null +++ b/architecture-decisions.md @@ -0,0 +1,33 @@ +# Architecture Decisions +This documents some of the "why" behind the [architecture](./architecture.md). + +## AI Models +We wanted to have flexibility to use different models, because this space is changing rapidly and models get better regularly. +Also, for you, we wanted to let you pick your model of choice. The system is designed to make changing models out simple. For how to do that, checkout the [setup guide](./setup.md). + +## Temporal +We asked one of the AI models used in this demo to answer this question (edited minorly): + +### Reliability and State Management: + Temporal ensures durability and fault tolerance, which are critical for agentic AI systems that involve long-running, complex workflows. For example, it preserves application state across failures, allowing AI agents to resume from where they left off without losing progress. Major AI companies use this for research experiments and agentic flows, where reliability is essential for continuous exploration. +### Handling Complex, Dynamic Workflows: +Agentic AI often involves unpredictable, multi-step processes like web crawling or data searching. Temporal’s workflow orchestration simplifies managing these tasks by abstracting complexity, providing features like retries, timeouts, and signals/queries. Temporal makes observability and resuming failed complex experiments and deep searches simple. +### Scalability and Speed: +Temporal enables rapid development and scaling, crucial for AI systems handling large-scale experiments or production workloads. AI model deployment and SRE teams use it to get code to production quickly with scale as a focus, while research teams can (and do!) run hundreds of experiments daily. Temporal customers report a significant reduction in development time (e.g., 20 weeks to 2 weeks for a feature). +### Observability and Debugging: +Agentic AI systems need insight into where processes succeed or fail. Temporal provides end-to-end visibility and durable workflow history, which Temporal customers are using to track agentic flows and understand failure points. +### Simplified Error Handling: +Temporal abstracts failure management (e.g., retries, rollbacks) so developers can focus on AI logic rather than "plumbing" code. This is vital for agentic AI, where external interactions (e.g., APIs, data sources) are prone to failure. +### Flexibility for Experimentation: +For research-heavy agentic AI, Temporal supports dynamic, code-first workflows and easy integration of new signals/queries, aligning with researchers needs to iterate quickly on experimental paths. + +In essence, Temporal’s value lies in its ability to make agentic AI systems more reliable, scalable, and easier to develop by handling the underlying complexity of distributed workflows for both research and applied AI tasks. + +Temporal was built to solve the problems of distributed computing, including scalability, reliability, security, visibility, and complexity. Agentic AI systems are complex distributed systems, so Temporal should fit well. Scaling, security, and productionalization are major pain points in March 2025 for building agentic systems. + +In this system Temporal lets you: +- Orchestrate interactions across distributed data stores and tools
+- Hold state, potentially over long periods of time
+- Ability to ‘self-heal’ and retry until the (probabilistic) LLM returns valid data
+- Support for human intervention such as approvals
+- Parallel processing for efficiency of data retrieval and tool use
\ No newline at end of file diff --git a/setup.md b/setup.md index 5f20618..1afa4c1 100644 --- a/setup.md +++ b/setup.md @@ -1,4 +1,5 @@ -## Configuration +# Setup Guide +## Initial Configuration This application uses `.env` files for configuration. Copy the [.env.example](.env.example) file to `.env` and update the values: @@ -6,6 +7,12 @@ This application uses `.env` files for configuration. Copy the [.env.example](.e cp .env.example .env ``` +Then add API keys, configuration, as desired. +If you want to show confirmations/enable the debugging UI, set +```bash +SHOW_CONFIRM=True +``` + ### Agent Goal Configuration The agent can be configured to pursue different goals using the `AGENT_GOAL` environment variable in your `.env` file. @@ -173,4 +180,6 @@ If you're running your train API above on a different host/port then change the - `tool_registry.py` contains the mapping of tool names to tool definitions (so the AI understands how to use them) - `goal_registry.py` contains descriptions of goals and the tools used to achieve them - The tools themselves are defined in their own files in `/tools` -- Note the mapping in `tools/__init__.py` to each tool \ No newline at end of file +- Note the mapping in `tools/__init__.py` to each tool + +For more details, check out [adding goals and tools guide](./adding-goals-and-tools.md). \ No newline at end of file diff --git a/todo.md b/todo.md index e1219ea..a30eeed 100644 --- a/todo.md +++ b/todo.md @@ -1,34 +1,25 @@ # todo list -[ ] add confirmation env setting to setup guide
+[x] add confirmation env setting to setup guide

-[x] how to add more scenarios, tools
-[ ] make agent respond to name of goals and not just numbers -[ ] L look at slides -[ ] josh to do fintech scenarios -[ ] create tests
-[ ] fix logging statements not to be all warn, maybe set logging level to info +[ ] try claude-3-7-sonnet-20250219, see [tool_activities.py](./activities/tool_activities.py)
+[ ] make agent respond to name of goals and not just numbers
+[x] L look at slides
+[ ] josh to do fintech scenarios
+[ ] expand [tests](./tests/agent_goal_workflow_test.py)
+[x] fix logging statements not to be all warn, maybe set logging level to info
-[ ] create people management scenarios
+[x] create people management scenarios
-[ ] 2. Others HR goals: +[ ] 2. Others HR goals:
-- book work travel
-- check insurance coverages
-- expense management
-- check in on the health of the team
-[x] demo the reasons why:
-- Orchestrate interactions across distributed data stores and tools
-- Hold state, potentially over long periods of time
-- Ability to ‘self-heal’ and retry until the (probabilistic) LLM returns valid data
-- Support for human intervention such as approvals
-- Parallel processing for efficiency of data retrieval and tool use
[ ] ask the ai agent how it did at the end of the conversation, was it efficient? successful? insert a search attribute to document that before return - Insight into the agent’s performance
-[x] customize prompts in [workflow to manage scenario](./workflows/tool_workflow.py)
-[x] add in new tools?
- [ ] non-retry the api key error - "Invalid API Key provided: sk_test_**J..." and "AuthenticationError"
[ ] make it so you can yeet yourself out of a goal and pick a new one
diff --git a/workflows/agent_goal_workflow.py b/workflows/agent_goal_workflow.py index d52e9ab..4699e86 100644 --- a/workflows/agent_goal_workflow.py +++ b/workflows/agent_goal_workflow.py @@ -259,9 +259,10 @@ class AgentGoalWorkflow: for listed_goal in goal_list: if listed_goal.id == goal: self.goal = listed_goal - # self.goal = goals.get(goal) workflow.logger.info("Changed goal to " + goal) - #todo reset goal or tools if this doesn't work or whatever + if goal is None: + workflow.logger.warning("Goal not set after goal reset, probably bad.") # if this happens, there's probably a problem with the goal list + # workflow function that defines if chat should end def chat_should_end(self) -> bool: