Temporal tests (#40)

* temporal tests * codex setup env script to readme
2026-03-16 06:28:08 +01:00 · 2025-05-29 12:56:58 -07:00
parent f7ef2b1c7e
commit e35181b5ad
9 changed files with 1832 additions and 7 deletions
--- a/README.md
+++ b/README.md
@@ -46,11 +46,44 @@ See [the guide to adding goals and tools](./adding-goals-and-tools.md).
 ## Architecture
 See [the architecture guide](./architecture.md).

+## Testing
+
+The project includes comprehensive tests for workflows and activities using Temporal's testing framework:
+
+```bash
+# Install dependencies including test dependencies
+poetry install --with dev
+
+# Run all tests
+poetry run pytest
+
+# Run with time-skipping for faster execution
+poetry run pytest --workflow-environment=time-skipping
+```
+
+**Test Coverage:**
+- ✅ **Workflow Tests**: AgentGoalWorkflow signals, queries, state management
+- ✅ **Activity Tests**: ToolActivities, LLM integration (mocked), environment configuration
+- ✅ **Integration Tests**: End-to-end workflow and activity execution
+
+**Documentation:**
+- **Quick Start**: [TESTING.md](TESTING.md) - Simple commands to run tests
+- **Comprehensive Guide**: [tests/README.md](tests/README.md) - Detailed testing documentation, patterns, and best practices
+
+## Development
+
+Install dependencies:
+```bash
+poetry install
+```
+
+Start the Temporal Server and API server, see [setup](setup.md)
+
 ## Productionalization & Adding Features
 - In a prod setting, I would need to ensure that payload data is stored separately (e.g. in S3 or a noSQL db - the claim-check pattern), or otherwise 'garbage collected'. Without these techniques, long conversations will fill up the workflow's conversation history, and start to breach Temporal event history payload limits.
 - A single worker can easily support many agent workflows (chats) running at the same time. Currently the workflow ID is the same each time, so it will only run one agent at a time. To run multiple agents, you can use a different workflow ID each time (e.g. by using a UUID or timestamp).
 - Perhaps the UI should show when the LLM response is being retried (i.e. activity retry attempt because the LLM provided bad output)
- Tests would be nice! [See tests](./tests/).
+- The project now includes comprehensive tests for workflows and activities! [See testing guide](TESTING.md).


 See [the todo](./todo.md) for more details.