Files
temporal-ai-agent/tests/README.md
Dan Davison 68ac9c40eb Migrate to uv (#52)
* uvx migrate-to-uv

* uv migration

* Fix hatch build

* Fixup

* uv run

* Add tab completion to devcontainer uv

Co-authored-by: Simon Emms <simon@simonemms.com>

* Revert "Add tab completion to devcontainer uv"

This reverts commit a3b7bdd84b.

---------

Co-authored-by: Simon Emms <simon@simonemms.com>
2025-07-30 11:37:42 -06:00

347 lines
9.2 KiB
Markdown

# Temporal AI Agent - Testing Guide
This directory contains comprehensive tests for the Temporal AI Agent project. The tests cover workflows, activities, and integration scenarios using Temporal's testing framework.
## Test Structure
```
tests/
├── README.md # This file - testing documentation
├── conftest.py # Test configuration and fixtures
├── test_agent_goal_workflow.py # Workflow tests
├── test_tool_activities.py # Activity tests
└── workflowtests/ # Legacy workflow tests
└── agent_goal_workflow_test.py
```
## Test Types
### 1. Workflow Tests (`test_agent_goal_workflow.py`)
Tests the main `AgentGoalWorkflow` class covering:
- **Workflow Initialization**: Basic workflow startup and state management
- **Signal Handling**: Testing user_prompt, confirm, end_chat signals
- **Query Methods**: Testing all workflow query endpoints
- **State Management**: Conversation history, goal changes, tool data
- **Validation Flow**: Prompt validation and error handling
- **Tool Execution Flow**: Confirmation and tool execution cycles
### 2. Activity Tests (`test_tool_activities.py`)
Tests the `ToolActivities` class and `dynamic_tool_activity` function:
- **LLM Integration**: Testing agent_toolPlanner with mocked LLM responses
- **Validation Logic**: Testing agent_validatePrompt with various scenarios
- **Environment Configuration**: Testing get_wf_env_vars with different env setups
- **JSON Processing**: Testing response parsing and sanitization
- **Dynamic Tool Execution**: Testing the dynamic activity dispatcher
- **Integration**: End-to-end activity execution in Temporal workers
### 3. Configuration Tests (`conftest.py`)
Provides shared test fixtures and configuration:
- **Temporal Environment**: Local and time-skipping test environments
- **Sample Data**: Pre-configured agent goals, conversation history, inputs
- **Test Client**: Configured Temporal client for testing
## Running Tests
### Prerequisites
Ensure you have the required dependencies installed:
```bash
uv sync
```
### Basic Test Execution
Run all tests:
```bash
uv run pytest
```
Run specific test files:
```bash
# Workflow tests only
uv run pytest tests/test_agent_goal_workflow.py
# Activity tests only
uv run pytest tests/test_tool_activities.py
# Legacy tests
uv run pytest tests/workflowtests/
```
Run with verbose output:
```bash
uv run pytest -v
```
### Test Environment Options
The tests support different Temporal environments via the `--workflow-environment` flag:
#### Local Environment (Default)
Uses a local Temporal test server:
```bash
uv run pytest --workflow-environment=local
```
#### Time-Skipping Environment
Uses Temporal's time-skipping test environment for faster execution:
```bash
uv run pytest --workflow-environment=time-skipping
```
#### External Server
Connect to an existing Temporal server:
```bash
uv run pytest --workflow-environment=localhost:7233
```
#### Setup Script for AI Agent environments such as OpenAI Codex
```bash
export SHELL=/bin/bash
curl -LsSf https://astral.sh/uv/install.sh | sh
export PATH="$HOME/.local/bin:$PATH"
ls
uv sync
cd frontend
npm install
cd ..
# Pre-download the temporal test server binary
uv run python -c "
import asyncio
import sys
from temporalio.testing import WorkflowEnvironment
async def predownload():
try:
print('Starting test server download...')
env = await WorkflowEnvironment.start_time_skipping()
print('Test server downloaded and started successfully')
await env.shutdown()
print('Test server shut down successfully')
except Exception as e:
print(f'Error during download: {e}')
sys.exit(1)
asyncio.run(predownload())
"
```
### Filtering Tests
Run tests by pattern:
```bash
# Run only validation tests
uv run pytest -k "validation"
# Run only workflow tests
uv run pytest -k "workflow"
# Run only activity tests
uv run pytest -k "activity"
```
Run tests by marker (if you add custom markers):
```bash
# Run only integration tests
uv run pytest -m integration
# Skip slow tests
uv run pytest -m "not slow"
```
## Test Configuration
### Test Discovery
The `vibe/` directory is excluded from test collection to avoid conflicts with sample tests. This is configured in `pyproject.toml`:
```toml
[tool.pytest.ini_options]
norecursedirs = ["vibe"]
```
### Environment Variables
Tests respect the following environment variables:
- `LLM_MODEL`: Model to use for LLM testing (defaults to "openai/gpt-4")
- `LLM_KEY`: API key for LLM service
- `LLM_BASE_URL`: Custom base URL for LLM service
- `SHOW_CONFIRM`: Whether to show confirmation dialogs
- `AGENT_GOAL`: Default agent goal setting
### Mocking Strategy
The tests use extensive mocking to avoid external dependencies:
- **LLM Calls**: Mocked using `unittest.mock` to avoid actual API calls
- **Tool Handlers**: Mocked to test workflow logic without tool execution
- **Environment Variables**: Patched for consistent test environments
## Writing New Tests
### Test Naming Convention
- Test files: `test_<module_name>.py`
- Test classes: `Test<ClassName>`
- Test methods: `test_<functionality>_<scenario>`
Example:
```python
class TestAgentGoalWorkflow:
async def test_user_prompt_signal_valid_input(self, client, sample_combined_input):
# Test implementation
pass
```
### Using Fixtures
Leverage the provided fixtures for consistent test data:
```python
async def test_my_workflow(self, client, sample_agent_goal, sample_conversation_history):
# client: Temporal test client
# sample_agent_goal: Pre-configured AgentGoal
# sample_conversation_history: Sample conversation data
pass
```
### Mocking External Dependencies
Always mock external services:
```python
@patch('activities.tool_activities.completion')
async def test_llm_integration(self, mock_completion):
mock_completion.return_value.choices[0].message.content = '{"test": "response"}'
# Test implementation
```
### Testing Workflow Signals and Queries
```python
async def test_workflow_signal(self, client, sample_combined_input):
# Start workflow
handle = await client.start_workflow(
AgentGoalWorkflow.run,
sample_combined_input,
id=str(uuid.uuid4()),
task_queue=task_queue_name,
)
# Send signal
await handle.signal(AgentGoalWorkflow.user_prompt, "test message")
# Query state
conversation = await handle.query(AgentGoalWorkflow.get_conversation_history)
# End workflow
await handle.signal(AgentGoalWorkflow.end_chat)
result = await handle.result()
```
## Test Data and Fixtures
### Sample Agent Goal
The `sample_agent_goal` fixture provides a basic agent goal with:
- Goal ID: "test_goal"
- One test tool with a required string argument
- Suitable for most workflow testing scenarios
### Sample Conversation History
The `sample_conversation_history` fixture provides:
- Basic user and agent message exchange
- Proper message format for testing
### Sample Combined Input
The `sample_combined_input` fixture provides:
- Complete workflow input with agent goal and tool params
- Conversation summary and prompt queue
- Ready for workflow execution
## Debugging Tests
### Verbose Logging
Enable detailed logging:
```bash
uv run pytest --log-cli-level=DEBUG -s
```
### Temporal Web UI
When using local environment, access Temporal Web UI at http://localhost:8233 to inspect workflow executions during tests.
### Test Isolation
Each test uses unique task queue names to prevent interference:
```python
task_queue_name = str(uuid.uuid4())
```
## Continuous Integration
### GitHub Actions Example
```yaml
name: Test
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v5
- run: uv sync
- run: uv run pytest --workflow-environment=time-skipping
```
### Test Coverage
Generate coverage reports:
```bash
uv add --group dev pytest-cov
uv run pytest --cov=workflows --cov=activities --cov-report=html
```
## Best Practices
1. **Mock External Dependencies**: Always mock LLM calls, file I/O, and network requests
2. **Use Time-Skipping**: For CI/CD, prefer time-skipping environment for speed
3. **Unique Identifiers**: Use UUIDs for workflow IDs and task queues
4. **Clean Shutdown**: Always end workflows properly in tests
5. **Descriptive Names**: Use clear, descriptive test names
6. **Test Edge Cases**: Include error scenarios and validation failures
7. **Keep Tests Fast**: Use mocks to avoid slow external calls
8. **Isolate Tests**: Ensure tests don't depend on each other
## Troubleshooting
### Common Issues
1. **Workflow Timeout**: Increase timeouts or use time-skipping environment
2. **Mock Not Working**: Check patch decorators and import paths
3. **Test Hanging**: Ensure workflows are properly ended with signals
4. **Environment Issues**: Check environment variable settings
### Getting Help
- Check Temporal Python SDK documentation
- Review existing test patterns in the codebase
- Use `uv run pytest --collect-only` to verify test discovery
- Run with `-v` flag for detailed output
## Legacy Tests
The `workflowtests/` directory contains legacy tests. New tests should be added to the main `tests/` directory following the patterns established in this guide.