Skip to content

Troubleshooting

This guide covers the most common problems with a running mithai agent: skill loading failures, unexpected agent behavior, approval flow problems, configuration issues, and memory failures. Each section shows how to diagnose the problem and fix it.



Run with --verbose to see debug-level output including skill loading, config parsing, LLM calls, tool routing, and approval decisions:

Terminal window
mithai run --verbose

or for the chat interface:

Terminal window
mithai chat --verbose

Key patterns to look for:

Log messageMeans
Loaded skill: services (3 tools)Skill loaded successfully
Skill services: missing TOOLS exporttools.py does not define TOOLS
Skill services: missing handle() functiontools.py does not define handle
Skipping services: missing prompt.md or tools.pyDirectory is incomplete
Human MCP: requesting approve for services__restart_serviceApproval gate triggered
Executing tool: services__restart_serviceTool approved and running
Tool denied by human: services__restart_serviceUser clicked Deny
LLM response: stop_reason=tool_useLLM called a tool
LLM response: stop_reason=end_turnLLM finished responding

For production deployments running as systemd:

Terminal window
sudo journalctl -u mithai -f

Checks all skills in ./skills/ for structural correctness — presence of prompt.md and tools.py, valid TOOLS schema, required handle function, and valid "human" levels:

Terminal window
mithai skill validate

Validate a single skill:

Terminal window
mithai skill validate services

A passing skill reports:

✓ services: OK (3 tools)

A failing skill reports the exact error:

✗ services: FAILED
- Tool 2: missing 'description'
- tools.py does not export handle() function

Runs a full connectivity check — LLM API, Slack/Telegram tokens, kubectl clusters if configured, skill loading, and filesystem permissions:

Terminal window
mithai doctor

Exit code is 0 when all checks pass, 1 if any fail. Use this in a deployment pipeline after config changes.


Symptom: The agent does not use a skill you added. mithai skill list does not show it. With --verbose, you see Skipping services: missing prompt.md or tools.py.

Diagnosis: Check that your skill directory has both required files:

Terminal window
ls skills/services/
# Expected: prompt.md tools.py

Check that the directory name does not start with . or _ — those are silently skipped by the loader.

Check that the skill directory is under a path listed in skills.paths in config.yaml:

skills:
paths:
- ./skills # relative to where mithai is run from

If the path is correct but the skill still does not appear, run mithai skill validate services to catch import errors that would otherwise be swallowed.

Fix: Ensure prompt.md and tools.py both exist, are non-empty, and the path in config.yaml points at the parent directory containing the skill folder.


Tool not appearing in the agent’s tool list

Section titled “Tool not appearing in the agent’s tool list”

Symptom: The agent ignores a specific tool or says it cannot perform an action that the tool handles. --verbose shows the skill loaded but the tool is never called.

Diagnosis: The tool is not making it into the LLM’s context for one of these reasons:

  1. The tool definition is missing from TOOLS. Verify:
# In tools.py — is your tool here?
TOOLS = [
{"name": "restart_service", ...},
]
  1. The TOOLS list has a syntax error that truncates it at parse time. Run mithai skill validate services — it imports tools.py and checks every entry.

  2. The tool description is ambiguous and the LLM is not selecting it. Improve the "description" field to be specific about when to use the tool.

Fix: Run mithai skill validate and correct any reported errors. If the tool definition is correct but the LLM is not calling it, rewrite the "description" to be more explicit, and update prompt.md to tell the agent under what conditions to use this tool.


Symptom: The agent’s response includes raw Python objects, None, or an error like TypeError: Object of type X is not JSON serializable.

Diagnosis: handle() must return a str. The engine passes this string directly to the LLM as the tool result. Any non-string return will cause a serialization error.

Check your handle signature:

def handle(name: str, input: dict, ctx: dict) -> str:
# Must return str in every branch
if name == "list_services":
services = ctx.get("config", {}).get("services", {})
return json.dumps({"services": services}) # correct: str
# Wrong — returning a dict
# return {"services": services}
# Wrong — forgetting the fallthrough
# (no return means None is returned)
return json.dumps({"error": f"unknown tool: {name}"})

Every code path in handle() must return a str. The last line handling unknown tool names is a required safety net.

Fix: Add -> str as the return type annotation and return json.dumps(...) in every branch, including the fallthrough case.


Symptom: The agent runs a tool that should require approval without asking. Or the tool always asks for approval when you expected it to auto-execute based on resolve_human logic.

Diagnosis: resolve_human is only called when the tool’s "human" field is set to "dynamic". Check the tool definition:

TOOLS = [
{
"name": "restart_service",
"description": "...",
"input_schema": {...},
"human": "dynamic", # required for resolve_human to be called
},
]

If "human" is set to "approve" or omitted, resolve_human is never called — the static level is used directly.

Also verify that resolve_human is exported at the module level (not nested inside a class or function):

# Correct — top-level function
def resolve_human(name: str, input: dict, ctx: dict) -> str | None:
...

Run mithai skill validate services — it checks that handle is callable but does not currently validate resolve_human. Add a quick Python import check if in doubt:

Terminal window
python3 -c "from skills.services.tools import resolve_human; print('ok')"

Fix: Set "human": "dynamic" on the tool definition and ensure resolve_human is a top-level function in tools.py.


Symptom: You send a message in Slack and nothing happens. No response, no error visible in Slack.

Diagnosis — check in this order:

  1. Is the bot running?
Terminal window
# systemd
sudo systemctl status mithai
# Docker
docker compose ps
  1. Is the bot invited to the channel?

The bot must be a member of the channel before it receives messages. In Slack, type /invite @your-bot-name in the channel.

  1. Is the bot token valid?

Run mithai doctor — it calls auth.test against the Slack API and reports the workspace name on success or the exact error on failure.

  1. Is the app token configured for Socket Mode?

The Socket Mode adapter (adapter.types: [slack]) requires both a bot token (xoxb-...) and an app-level token (xapp-...). The app-level token must have the connections:write scope. Check your app at https://api.slack.com/apps under Basic Information → App-Level Tokens.

  1. Check the process logs:
Terminal window
sudo journalctl -u mithai -n 50

Look for invalid_auth, missing_scope, or connection error messages.


Agent responding to everything instead of only mentions

Section titled “Agent responding to everything instead of only mentions”

Symptom: The bot replies to every message in a channel, not just messages that @mention it.

Diagnosis: Check respond in config.yaml:

adapter:
slack:
respond: mentions # correct: only reply to @mentions
# respond: all # this causes the bot to reply to everything

The default when respond is omitted is "all". Set it explicitly to "mentions" for normal Slack deployments.

Fix:

adapter:
slack:
bot_token: ${SLACK_BOT_TOKEN}
app_token: ${SLACK_APP_TOKEN}
respond: mentions

Restart the agent after changing this.


Symptom: The agent does not remember things said earlier in the same thread. Each message is treated as a new conversation.

Diagnosis: The agent scopes sessions to thread_ts in Slack. If you are messaging in the channel top-level (not in a thread), each top-level message starts a new session scoped to channel_id. The agent accumulates history within a thread, not across the channel.

Check session storage:

Terminal window
ls .mithai/state/sessions/

If no session files exist, the state backend is not writing to disk. Verify the path is correct and writable:

state:
backend: filesystem
filesystem:
path: ./.mithai/state

Check the directory permissions:

Terminal window
ls -la .mithai/state/

Also confirm sessions.max_history is not set to 0:

sessions:
max_history: 10 # number of turns to replay into context
max_stored: 50 # number of turns to keep on disk

Fix: Start a thread by replying to the bot’s first message — sessions accumulate within a thread. For cross-thread memory, the agent uses MEMORY.md via the memory skill.


Symptom: The agent calls tools repeatedly without producing a final response. It looks like it is looping — calling the same tool or a sequence of tools over and over.

Diagnosis: This happens when:

  1. A tool is returning an error and the agent retries it. Check what the tool returns — if it returns {"error": "..."}, the LLM may attempt the same call with slightly different inputs hoping to succeed.

  2. A tool is returning data the LLM does not understand and it keeps trying to reinterpret it. Check the JSON shape your tool returns and whether it matches what prompt.md describes.

  3. The LLM is calling a tool to complete a previous tool call’s intent (chaining too deep). Check --verbose logs to see the full sequence of tool calls.

Fix options:

  • Fix the underlying tool error so it returns useful results.
  • Make handle() return clearer error messages that tell the LLM when to stop retrying.
  • Add guidance to prompt.md telling the agent to stop after N failed attempts.
  • Use the human.overrides config to force a specific tool to require approval, giving you a chance to interrupt the loop manually:
human:
overrides:
services__restart_service: approve

Symptom: You expect an approval prompt but the tool runs automatically.

Diagnosis — check in order:

  1. Is the "human" field set on the tool?
TOOLS = [
{
"name": "restart_service",
"description": "...",
"input_schema": {...},
"human": "approve", # this must be present
},
]

Run mithai skill validate services — it reports missing or invalid "human" values.

  1. Is there a config override removing the approval requirement?

Check config.yaml:

human:
overrides:
services__restart_service: null # null removes the requirement
  1. If using "human": "dynamic", is resolve_human returning None?

Add a temporary print or log statement to resolve_human to confirm it is being called and what it returns. The approval is skipped if resolve_human returns None.

  1. Has auto-promote kicked in?

After learning.approval_auto_promote consecutive approvals (default: 3) with no denials for the same tool+input combination, mithai auto-promotes that specific input to auto-execute. This is stored in memory/approvals.json. Check:

Terminal window
cat memory/approvals.json

Fix: If auto-promote removed an approval you want to keep, delete the relevant entry from memory/approvals.json, or set learning.approval_auto_promote: 0 to disable auto-promotion entirely.


Symptom: The agent reports the tool was denied even though no one clicked Deny. Logs show a timeout.

Diagnosis: The default approval timeout is 300 seconds (5 minutes). Check your config:

adapter:
slack:
approval_timeout: 300 # seconds

If your team needs more time, increase this. Note that the agent thread blocks while waiting for approval — a very long timeout (e.g. 3600) means the agent holds an open connection to Slack for that duration.

Fix: Increase approval_timeout:

adapter:
slack:
approval_timeout: 600 # 10 minutes

Approved command runs again with approval next time

Section titled “Approved command runs again with approval next time”

Symptom: You approved a command, it ran successfully, but the next time the same command runs it still asks for approval. Auto-promote is not working.

Diagnosis: Auto-promote requires the same tool and the same normalized input to be approved approval_auto_promote times (default: 3) with zero denials. Check:

  1. Is learning.enabled: true in config.yaml?
  2. Is memory configured and the memory directory writable?
  3. Is the approval count in memory/approvals.json actually incrementing?
Terminal window
cat memory/approvals.json

You should see something like:

{
"services__restart_service": {
"{\"environment\":\"staging\",\"service\":\"auth\"}": {
"approved": 2,
"denied": 0
}
}
}

The input key is the JSON-serialized input dict with sorted keys. If the inputs differ between calls (e.g. different whitespace or key order), each is tracked as a separate entry.

  1. Is approval_auto_promote set to a value other than 3?
learning:
approval_auto_promote: 3 # promote after this many consecutive approvals

Fix: Verify the memory backend is writing (cat memory/approvals.json shows real data), check the approved count is incrementing, and confirm the threshold matches your config.


Environment variable not substituted (shows ${VAR} literally)

Section titled “Environment variable not substituted (shows ${VAR} literally)”

Symptom: The agent fails to connect and logs show invalid_auth or a similar error. Inspecting the behavior reveals the token is literally the string ${SLACK_BOT_TOKEN}.

Diagnosis: The environment variable is not set in the environment when mithai run starts. Check:

Terminal window
echo $SLACK_BOT_TOKEN

If it prints nothing, the variable is not exported. Also check whether a .env file is present in the same directory as config.yaml:

Terminal window
ls -la .env

mithai calls python-dotenv’s load_dotenv() on the .env file in the config directory at startup. If the file does not exist or is not in the right directory, variables are not loaded.

Fix: Create .env in the same directory as config.yaml:

SLACK_BOT_TOKEN=xoxb-...
SLACK_APP_TOKEN=xapp-...
ANTHROPIC_API_KEY=sk-ant-...

Then restart mithai.


Symptom: mithai run exits immediately with a YAML parse error or Config validation failed.

Diagnosis: YAML is whitespace-sensitive. Common mistakes:

  • Tabs instead of spaces (YAML requires spaces)
  • Inconsistent indentation
  • Unquoted strings containing : or #
  • A multiline system_prompt missing the | block scalar indicator

Test the file standalone:

Terminal window
python3 -c "import yaml; yaml.safe_load(open('config.yaml'))"

If it raises a ScannerError, it reports the line number. If it parses fine, the error is a schema validation failure — run mithai doctor for a clearer error message.

Fix: Correct the YAML at the reported line. For multiline strings, use the block scalar:

bot:
system_prompt: |
You are a helpful operations assistant.
You have access to skills that let you interact with infrastructure.

Skill config not reaching ctx[“config”]

Section titled “Skill config not reaching ctx[“config”]”

Symptom: ctx.get("config", {}) returns {} in your skill, but you have config defined in config.yaml.

Diagnosis: The config key in ctx is populated from skills.config.<skill_name>. The skill name must match the directory name exactly.

Check config.yaml:

skills:
config:
services: # this key must match the skill directory name
services:
checkout:
url: https://...

Check the skill directory name:

services/
ls skills/

If the skill is named service (singular) but the config key is services (plural), ctx["config"] will be {}.

Fix: Make the key under skills.config match the skill directory name exactly.


memory_write failing with path validation error

Section titled “memory_write failing with path validation error”

Symptom: The memory skill’s memory_write tool returns an error like Invalid path: ../etc/passwd or ValueError: Invalid path.

Diagnosis: The memory backend enforces a path traversal guard. Any path that resolves outside the memory root directory is rejected. This includes paths starting with ../, absolute paths starting with /, and any path that after normalization escapes the root.

Valid paths:

  • MEMORY.md
  • daily/2024-01-15.md
  • notes/services.md

Invalid paths:

  • ../config.yaml
  • /etc/passwd
  • ../../secrets

Fix: Use relative paths that stay within the memory root. The memory root is set by learning.memory.filesystem.path in config.yaml (default: ./memory).


Symptom: The agent does not remember facts you told it to remember. MEMORY.md exists on disk but the agent behaves as if it has never seen it.

Diagnosis: Check that learning.enabled: true and the memory backend is correctly configured:

learning:
enabled: true
memory:
backend: filesystem
filesystem:
path: ./memory

Verify the file exists and is non-empty:

Terminal window
cat memory/MEMORY.md

With --verbose, look for the memory injection in the startup logs. The engine calls self._memory.read("MEMORY.md") on every request. If it returns None or empty string, nothing is injected.

Check the memory path is an absolute path or is correctly relative to the working directory from which mithai run is executed. If you run mithai run from /home/user/mybot/ but the config says path: ./memory, the backend expects /home/user/mybot/memory/.

Fix: Run mithai run from the project root (where config.yaml lives), or use an absolute path:

learning:
memory:
backend: filesystem
filesystem:
path: /var/lib/mithai/memory