Skip to content

Testing your skill

Skills are Python modules. handle() and resolve_human() are plain functions that take a dict and return a string — you can unit test them directly without starting the agent, connecting to Slack, or touching the LLM.



The agent framework handles routing, LLM interaction, and approval gates. Your skill’s job is simpler: receive a tool name and a dict of inputs, do something, and return a JSON string. Because the interface is this narrow, unit tests are fast and reliable.

Testing handle() directly lets you:

  • Verify correct JSON is returned for each tool
  • Check that config values are read and used properly
  • Assert that memory writes happen under the right conditions
  • Confirm error paths return well-formed error objects

Testing resolve_human() directly lets you verify your approval logic without running the full agent loop — important because approval bugs are silent (a missed "approve" means a destructive action runs automatically).


Every tool handler receives a ctx dict built by the engine. In tests, construct it manually:

ctx = {
"config": {
"services": {
"checkout": {"url": "https://checkout.internal/health"},
"billing": {"url": "https://billing.internal/health"},
}
},
"memory": None,
"state": None,
"channel_id": "C123",
"user_id": "U456",
"logger": None,
}

The real ctx is built by build_context() in src/mithai/core/context.py. The fields are:

KeyTypeWhat it is
configdictThe skills.config.<skill_name> block from config.yaml
memoryMemoryBackend | NonePersistent memory store
stateStateBackend | NonePersistent key-value store
channel_idstrChannel the message came from
user_idstrWho sent the message
loggerLogger | NonePython logger (safe to pass None in tests)

Your skill should always call ctx.get("memory") rather than ctx["memory"] and check for None before using it. Tests where you pass None will surface any missing guard.


The services skill from the tutorial (skills/services/tools.py) exports three tools. Here is how to test each one.

from skills.services.tools import handle
def test_list_services_returns_configured_services():
ctx = {
"config": {
"services": {
"checkout": {"url": "https://checkout.internal/health"},
}
},
"memory": None, "state": None, "channel_id": "C1", "user_id": "U1", "logger": None,
}
result = json.loads(handle("list_services", {}, ctx))
assert "checkout" in result["services"]
assert result["services"]["checkout"]["url"] == "https://checkout.internal/health"
def test_list_services_empty_config():
ctx = {
"config": {},
"memory": None, "state": None, "channel_id": "C1", "user_id": "U1", "logger": None,
}
result = json.loads(handle("list_services", {}, ctx))
assert "message" in result

Testing check_health requires an HTTP call. Use a mock HTTP server (see Mocking HTTP):

from unittest.mock import patch, MagicMock
def test_check_health_healthy_service():
mock_response = MagicMock()
mock_response.status = 200
mock_response.__enter__ = lambda s: s
mock_response.__exit__ = MagicMock(return_value=False)
with patch("urllib.request.urlopen", return_value=mock_response):
result = json.loads(handle(
"check_health",
{"service": "checkout", "url": "https://checkout.internal/health"},
make_ctx(),
))
assert result["healthy"] is True
assert result["status"] == 200
assert result["service"] == "checkout"
def test_check_health_unreachable_service():
import urllib.error
with patch("urllib.request.urlopen", side_effect=urllib.error.URLError("timed out")):
result = json.loads(handle(
"check_health",
{"service": "auth", "url": "https://auth.internal/health"},
make_ctx(),
))
assert result["healthy"] is False
assert "error" in result
def test_restart_service_returns_success():
ctx = make_ctx()
result = json.loads(handle(
"restart_service",
{"service": "auth", "environment": "staging"},
ctx,
))
assert result["restarted"] is True
assert result["service"] == "auth"
assert result["environment"] == "staging"

resolve_human takes the same arguments as handle but returns a string ("approve", "confirm") or None. Test every branch.

from skills.services.tools import resolve_human
def test_production_restart_requires_approval():
level = resolve_human(
"restart_service",
{"service": "auth", "environment": "production"},
{},
)
assert level == "approve"
def test_staging_restart_is_auto_execute():
level = resolve_human(
"restart_service",
{"service": "auth", "environment": "staging"},
{},
)
assert level is None
def test_non_restart_tools_are_auto_execute():
for tool in ["list_services", "check_health"]:
level = resolve_human(tool, {}, {})
assert level is None, f"Expected None for {tool}, got {level!r}"

The services skill uses urllib.request.urlopen. Patch it at the call site — the module path where it is used, not where it is defined.

from unittest.mock import patch, MagicMock
def make_mock_response(status: int):
resp = MagicMock()
resp.status = status
resp.__enter__ = lambda s: s
resp.__exit__ = MagicMock(return_value=False)
return resp
def test_check_health_503():
with patch("skills.services.tools.urllib.request.urlopen",
return_value=make_mock_response(503)):
result = json.loads(handle(
"check_health",
{"service": "auth", "url": "https://auth.internal/health"},
make_ctx(),
))
assert result["healthy"] is False
assert result["status"] == 503

If your skill uses the responses library instead of stdlib urllib, add it as a dev dependency and use its @responses.activate decorator:

import responses as resp_mock
@resp_mock.activate
def test_check_health_with_responses_library():
resp_mock.add(resp_mock.GET, "https://checkout.internal/health", status=200)
result = json.loads(handle(
"check_health",
{"service": "checkout", "url": "https://checkout.internal/health"},
make_ctx(),
))
assert result["healthy"] is True

Pass a real FilesystemMemoryBackend pointed at a tmp_path, or write a minimal in-memory stub.

Section titled “Option 1: FilesystemMemoryBackend on tmp_path (recommended)”
import pytest
from mithai.memory.filesystem import FilesystemMemoryBackend
@pytest.fixture
def mem(tmp_path):
return FilesystemMemoryBackend(tmp_path / "memory")
def test_restart_writes_to_memory(mem):
ctx = {**make_ctx(), "memory": mem}
handle("restart_service", {"service": "auth", "environment": "production"}, ctx)
content = mem.read("restarts.md")
assert content is not None
assert "auth" in content

When you want tests that never touch disk:

class InMemoryMemory:
def __init__(self):
self._store = {}
def read(self, path):
return self._store.get(path)
def write(self, path, content, *, append=False):
if append:
self._store[path] = self._store.get(path, "") + content
else:
self._store[path] = content
def exists(self, path):
return path in self._store

Use it the same way:

def test_restart_appends_to_log():
mem = InMemoryMemory()
ctx = {**make_ctx(), "memory": mem}
handle("restart_service", {"service": "billing", "environment": "staging"}, ctx)
assert "billing" in (mem.read("restarts.md") or "")

Place skill tests under tests/skills/:

tests/
└── skills/
└── test_services.py

Run the full suite:

Terminal window
uv run pytest tests/ -v

Run only skill tests:

Terminal window
uv run pytest tests/skills/ -v

Run a single test:

Terminal window
uv run pytest tests/skills/test_services.py::test_check_health_healthy_service -v

Here is a full test file for the services skill, covering all three tools and both approval branches.

tests/skills/test_services.py

"""Tests for the services skill."""
import json
import sys
from pathlib import Path
from unittest.mock import MagicMock, patch
import pytest
# Add skills directory to path so the module can be imported directly
sys.path.insert(0, str(Path(__file__).parent.parent.parent / "skills"))
from services.tools import handle, resolve_human # noqa: E402
# ── helpers ──────────────────────────────────────────────────────────────────
def make_ctx(services=None, memory=None):
return {
"config": {
"services": services or {
"checkout": {"url": "https://checkout.internal/health"},
"billing": {"url": "https://billing.internal/health"},
"auth": {"url": "https://auth.internal/health"},
}
},
"memory": memory,
"state": None,
"channel_id": "C123",
"user_id": "U456",
"logger": None,
}
def make_mock_response(status: int):
resp = MagicMock()
resp.status = status
resp.__enter__ = lambda s: s
resp.__exit__ = MagicMock(return_value=False)
return resp
# ── list_services ─────────────────────────────────────────────────────────────
def test_list_services_returns_all_configured_services():
result = json.loads(handle("list_services", {}, make_ctx()))
assert set(result["services"].keys()) == {"checkout", "billing", "auth"}
def test_list_services_empty_config_returns_message():
result = json.loads(handle("list_services", {}, make_ctx(services={})))
assert "message" in result
assert "services" not in result
# ── check_health ──────────────────────────────────────────────────────────────
def test_check_health_healthy_200():
with patch("services.tools.urllib.request.urlopen",
return_value=make_mock_response(200)):
result = json.loads(handle(
"check_health",
{"service": "checkout", "url": "https://checkout.internal/health"},
make_ctx(),
))
assert result["healthy"] is True
assert result["status"] == 200
assert result["service"] == "checkout"
assert "response_ms" in result
def test_check_health_unhealthy_503():
with patch("services.tools.urllib.request.urlopen",
return_value=make_mock_response(503)):
result = json.loads(handle(
"check_health",
{"service": "auth", "url": "https://auth.internal/health"},
make_ctx(),
))
assert result["healthy"] is False
assert result["status"] == 503
def test_check_health_url_error():
import urllib.error
with patch("services.tools.urllib.request.urlopen",
side_effect=urllib.error.URLError("connection refused")):
result = json.loads(handle(
"check_health",
{"service": "billing", "url": "https://billing.internal/health"},
make_ctx(),
))
assert result["healthy"] is False
assert "error" in result
def test_check_health_looks_up_url_from_config():
"""check_health should use ctx config when no URL is provided."""
with patch("services.tools.urllib.request.urlopen",
return_value=make_mock_response(200)):
result = json.loads(handle(
"check_health",
{"service": "checkout"}, # no url field
make_ctx(),
))
assert result["healthy"] is True
# ── restart_service ───────────────────────────────────────────────────────────
def test_restart_service_returns_success():
result = json.loads(handle(
"restart_service",
{"service": "auth", "environment": "staging"},
make_ctx(),
))
assert result["restarted"] is True
assert result["service"] == "auth"
assert result["environment"] == "staging"
def test_restart_service_writes_to_memory(tmp_path):
from mithai.memory.filesystem import FilesystemMemoryBackend
mem = FilesystemMemoryBackend(tmp_path / "memory")
ctx = {**make_ctx(), "memory": mem}
handle("restart_service", {"service": "auth", "environment": "production"}, ctx)
content = mem.read("restarts.md")
assert content is not None
assert "auth" in content
# ── resolve_human ─────────────────────────────────────────────────────────────
def test_production_restart_requires_approval():
level = resolve_human(
"restart_service", {"service": "auth", "environment": "production"}, {}
)
assert level == "approve"
def test_staging_restart_is_auto_execute():
level = resolve_human(
"restart_service", {"service": "auth", "environment": "staging"}, {}
)
assert level is None
def test_read_only_tools_are_auto_execute():
for tool in ["list_services", "check_health"]:
assert resolve_human(tool, {}, {}) is None

Unit tests verify your logic. They cannot test the full approval flow — sending a Slack message with Approve/Deny buttons, waiting for a click, and routing the result back to the engine.

For end-to-end approval testing, use mithai chat:

Terminal window
mithai chat

mithai chat runs the full engine loop in the terminal. When a tool requires approval, it prints the approval prompt and waits for you to type approve or deny. This exercises resolve_human, the Human MCP gate, handle, and the session recording path together.

For automated integration tests of the engine, see how tests/test_human_mcp.py constructs a minimal HumanMCP instance with a mock adapter — the same pattern applies if you need to test approval routing in CI.