Testing your skill
Skills are Python modules. handle() and resolve_human() are plain functions that take a dict and return a string — you can unit test them directly without starting the agent, connecting to Slack, or touching the LLM.
Why test skills
Section titled “Why test skills”The agent framework handles routing, LLM interaction, and approval gates. Your skill’s job is simpler: receive a tool name and a dict of inputs, do something, and return a JSON string. Because the interface is this narrow, unit tests are fast and reliable.
Testing handle() directly lets you:
- Verify correct JSON is returned for each tool
- Check that config values are read and used properly
- Assert that memory writes happen under the right conditions
- Confirm error paths return well-formed error objects
Testing resolve_human() directly lets you verify your approval logic without running the full agent loop — important because approval bugs are silent (a missed "approve" means a destructive action runs automatically).
The test setup
Section titled “The test setup”Every tool handler receives a ctx dict built by the engine. In tests, construct it manually:
ctx = { "config": { "services": { "checkout": {"url": "https://checkout.internal/health"}, "billing": {"url": "https://billing.internal/health"}, } }, "memory": None, "state": None, "channel_id": "C123", "user_id": "U456", "logger": None,}The real ctx is built by build_context() in src/mithai/core/context.py. The fields are:
| Key | Type | What it is |
|---|---|---|
config | dict | The skills.config.<skill_name> block from config.yaml |
memory | MemoryBackend | None | Persistent memory store |
state | StateBackend | None | Persistent key-value store |
channel_id | str | Channel the message came from |
user_id | str | Who sent the message |
logger | Logger | None | Python logger (safe to pass None in tests) |
Your skill should always call ctx.get("memory") rather than ctx["memory"] and check for None before using it. Tests where you pass None will surface any missing guard.
Testing handle()
Section titled “Testing handle()”The services skill from the tutorial (skills/services/tools.py) exports three tools. Here is how to test each one.
list_services
Section titled “list_services”from skills.services.tools import handle
def test_list_services_returns_configured_services(): ctx = { "config": { "services": { "checkout": {"url": "https://checkout.internal/health"}, } }, "memory": None, "state": None, "channel_id": "C1", "user_id": "U1", "logger": None, } result = json.loads(handle("list_services", {}, ctx)) assert "checkout" in result["services"] assert result["services"]["checkout"]["url"] == "https://checkout.internal/health"
def test_list_services_empty_config(): ctx = { "config": {}, "memory": None, "state": None, "channel_id": "C1", "user_id": "U1", "logger": None, } result = json.loads(handle("list_services", {}, ctx)) assert "message" in resultcheck_health
Section titled “check_health”Testing check_health requires an HTTP call. Use a mock HTTP server (see Mocking HTTP):
from unittest.mock import patch, MagicMock
def test_check_health_healthy_service(): mock_response = MagicMock() mock_response.status = 200 mock_response.__enter__ = lambda s: s mock_response.__exit__ = MagicMock(return_value=False)
with patch("urllib.request.urlopen", return_value=mock_response): result = json.loads(handle( "check_health", {"service": "checkout", "url": "https://checkout.internal/health"}, make_ctx(), ))
assert result["healthy"] is True assert result["status"] == 200 assert result["service"] == "checkout"
def test_check_health_unreachable_service(): import urllib.error with patch("urllib.request.urlopen", side_effect=urllib.error.URLError("timed out")): result = json.loads(handle( "check_health", {"service": "auth", "url": "https://auth.internal/health"}, make_ctx(), ))
assert result["healthy"] is False assert "error" in resultrestart_service
Section titled “restart_service”def test_restart_service_returns_success(): ctx = make_ctx() result = json.loads(handle( "restart_service", {"service": "auth", "environment": "staging"}, ctx, )) assert result["restarted"] is True assert result["service"] == "auth" assert result["environment"] == "staging"Testing resolve_human()
Section titled “Testing resolve_human()”resolve_human takes the same arguments as handle but returns a string ("approve", "confirm") or None. Test every branch.
from skills.services.tools import resolve_human
def test_production_restart_requires_approval(): level = resolve_human( "restart_service", {"service": "auth", "environment": "production"}, {}, ) assert level == "approve"
def test_staging_restart_is_auto_execute(): level = resolve_human( "restart_service", {"service": "auth", "environment": "staging"}, {}, ) assert level is None
def test_non_restart_tools_are_auto_execute(): for tool in ["list_services", "check_health"]: level = resolve_human(tool, {}, {}) assert level is None, f"Expected None for {tool}, got {level!r}"Mocking HTTP
Section titled “Mocking HTTP”The services skill uses urllib.request.urlopen. Patch it at the call site — the module path where it is used, not where it is defined.
from unittest.mock import patch, MagicMock
def make_mock_response(status: int): resp = MagicMock() resp.status = status resp.__enter__ = lambda s: s resp.__exit__ = MagicMock(return_value=False) return resp
def test_check_health_503(): with patch("skills.services.tools.urllib.request.urlopen", return_value=make_mock_response(503)): result = json.loads(handle( "check_health", {"service": "auth", "url": "https://auth.internal/health"}, make_ctx(), )) assert result["healthy"] is False assert result["status"] == 503If your skill uses the responses library instead of stdlib urllib, add it as a dev dependency and use its @responses.activate decorator:
import responses as resp_mock
@resp_mock.activatedef test_check_health_with_responses_library(): resp_mock.add(resp_mock.GET, "https://checkout.internal/health", status=200) result = json.loads(handle( "check_health", {"service": "checkout", "url": "https://checkout.internal/health"}, make_ctx(), )) assert result["healthy"] is TrueMocking memory
Section titled “Mocking memory”Pass a real FilesystemMemoryBackend pointed at a tmp_path, or write a minimal in-memory stub.
Option 1: FilesystemMemoryBackend on tmp_path (recommended)
Section titled “Option 1: FilesystemMemoryBackend on tmp_path (recommended)”import pytestfrom mithai.memory.filesystem import FilesystemMemoryBackend
@pytest.fixturedef mem(tmp_path): return FilesystemMemoryBackend(tmp_path / "memory")
def test_restart_writes_to_memory(mem): ctx = {**make_ctx(), "memory": mem} handle("restart_service", {"service": "auth", "environment": "production"}, ctx) content = mem.read("restarts.md") assert content is not None assert "auth" in contentOption 2: In-memory stub
Section titled “Option 2: In-memory stub”When you want tests that never touch disk:
class InMemoryMemory: def __init__(self): self._store = {}
def read(self, path): return self._store.get(path)
def write(self, path, content, *, append=False): if append: self._store[path] = self._store.get(path, "") + content else: self._store[path] = content
def exists(self, path): return path in self._storeUse it the same way:
def test_restart_appends_to_log(): mem = InMemoryMemory() ctx = {**make_ctx(), "memory": mem} handle("restart_service", {"service": "billing", "environment": "staging"}, ctx) assert "billing" in (mem.read("restarts.md") or "")Running tests
Section titled “Running tests”Place skill tests under tests/skills/:
tests/└── skills/ └── test_services.pyRun the full suite:
uv run pytest tests/ -vRun only skill tests:
uv run pytest tests/skills/ -vRun a single test:
uv run pytest tests/skills/test_services.py::test_check_health_healthy_service -vComplete test file
Section titled “Complete test file”Here is a full test file for the services skill, covering all three tools and both approval branches.
tests/skills/test_services.py
"""Tests for the services skill."""
import jsonimport sysfrom pathlib import Pathfrom unittest.mock import MagicMock, patch
import pytest
# Add skills directory to path so the module can be imported directlysys.path.insert(0, str(Path(__file__).parent.parent.parent / "skills"))
from services.tools import handle, resolve_human # noqa: E402
# ── helpers ──────────────────────────────────────────────────────────────────
def make_ctx(services=None, memory=None): return { "config": { "services": services or { "checkout": {"url": "https://checkout.internal/health"}, "billing": {"url": "https://billing.internal/health"}, "auth": {"url": "https://auth.internal/health"}, } }, "memory": memory, "state": None, "channel_id": "C123", "user_id": "U456", "logger": None, }
def make_mock_response(status: int): resp = MagicMock() resp.status = status resp.__enter__ = lambda s: s resp.__exit__ = MagicMock(return_value=False) return resp
# ── list_services ─────────────────────────────────────────────────────────────
def test_list_services_returns_all_configured_services(): result = json.loads(handle("list_services", {}, make_ctx())) assert set(result["services"].keys()) == {"checkout", "billing", "auth"}
def test_list_services_empty_config_returns_message(): result = json.loads(handle("list_services", {}, make_ctx(services={}))) assert "message" in result assert "services" not in result
# ── check_health ──────────────────────────────────────────────────────────────
def test_check_health_healthy_200(): with patch("services.tools.urllib.request.urlopen", return_value=make_mock_response(200)): result = json.loads(handle( "check_health", {"service": "checkout", "url": "https://checkout.internal/health"}, make_ctx(), )) assert result["healthy"] is True assert result["status"] == 200 assert result["service"] == "checkout" assert "response_ms" in result
def test_check_health_unhealthy_503(): with patch("services.tools.urllib.request.urlopen", return_value=make_mock_response(503)): result = json.loads(handle( "check_health", {"service": "auth", "url": "https://auth.internal/health"}, make_ctx(), )) assert result["healthy"] is False assert result["status"] == 503
def test_check_health_url_error(): import urllib.error with patch("services.tools.urllib.request.urlopen", side_effect=urllib.error.URLError("connection refused")): result = json.loads(handle( "check_health", {"service": "billing", "url": "https://billing.internal/health"}, make_ctx(), )) assert result["healthy"] is False assert "error" in result
def test_check_health_looks_up_url_from_config(): """check_health should use ctx config when no URL is provided.""" with patch("services.tools.urllib.request.urlopen", return_value=make_mock_response(200)): result = json.loads(handle( "check_health", {"service": "checkout"}, # no url field make_ctx(), )) assert result["healthy"] is True
# ── restart_service ───────────────────────────────────────────────────────────
def test_restart_service_returns_success(): result = json.loads(handle( "restart_service", {"service": "auth", "environment": "staging"}, make_ctx(), )) assert result["restarted"] is True assert result["service"] == "auth" assert result["environment"] == "staging"
def test_restart_service_writes_to_memory(tmp_path): from mithai.memory.filesystem import FilesystemMemoryBackend mem = FilesystemMemoryBackend(tmp_path / "memory") ctx = {**make_ctx(), "memory": mem} handle("restart_service", {"service": "auth", "environment": "production"}, ctx) content = mem.read("restarts.md") assert content is not None assert "auth" in content
# ── resolve_human ─────────────────────────────────────────────────────────────
def test_production_restart_requires_approval(): level = resolve_human( "restart_service", {"service": "auth", "environment": "production"}, {} ) assert level == "approve"
def test_staging_restart_is_auto_execute(): level = resolve_human( "restart_service", {"service": "auth", "environment": "staging"}, {} ) assert level is None
def test_read_only_tools_are_auto_execute(): for tool in ["list_services", "check_health"]: assert resolve_human(tool, {}, {}) is NoneTesting approval flow end-to-end
Section titled “Testing approval flow end-to-end”Unit tests verify your logic. They cannot test the full approval flow — sending a Slack message with Approve/Deny buttons, waiting for a click, and routing the result back to the engine.
For end-to-end approval testing, use mithai chat:
mithai chatmithai chat runs the full engine loop in the terminal. When a tool requires approval, it prints the approval prompt and waits for you to type approve or deny. This exercises resolve_human, the Human MCP gate, handle, and the session recording path together.
For automated integration tests of the engine, see how tests/test_human_mcp.py constructs a minimal HumanMCP instance with a mock adapter — the same pattern applies if you need to test approval routing in CI.