Scheduling
A pre-registered forward test only earns its labeled-outcomes dataset if
the factory actually runs every trading day. This page is the operator’s
recipe for that — what to schedule, what order it has to run in, and
the two pitfalls (PATH and editable-install drift) that bite first.
If you haven’t pre-registered the experiment yet, do that first via
cents experiment register. The schedule below
assumes an active experiment is already in place.
What needs to run on a schedule
Section titled “What needs to run on a schedule”For a two-arm forward test, six commands run on a cadence:
| Command | Cadence | Why |
|---|---|---|
cents event refresh | Daily, before open (e.g. 06:00 ET) | Pulls Federal Register events; fires PREMISE_INVALIDATION alerts against any open thesis whose premise_tags intersect an event’s tags. |
cents factory run | Daily, after the refresh (e.g. 06:30 ET) | The LLM arm — the real multi-agent stack. Stamps orchestrator_label = "llm" on every thesis it opens. |
cents factory run --orchestrator random --orchestrator-seed N | Daily, same window, separate invocation | The control arm. Uniform-random conviction, no LLM calls. Stamps orchestrator_label = "random". Without this you cannot tell whether the LLM arm produced signal or rode the universe. |
cents shadow backfill --horizon 30 | Daily (or weekly) | Walks the shadow_opens table past the 30-day horizon and fills forward returns from price history. This is what gives you a baseline against rejected candidates. |
cents universe ingest-delistings | Weekly | Keeps the delistings table current. Forward runs are unbiased today; the longer this has been running, the less retrospective survivorship bias bleeds into back-dated universe reconstructions. See Scope. |
cents experiment status --output json | Daily, appended to a log | Lets you watch verdict_ready and opened_by_arm evolve over the 90 days without having to remember to check. |
cents eval run --persist-history + cents eval drift-check | Daily, after factory runs | Persists today’s eval metrics and fires a MODEL_DRIFT alert if F1 falls more than 5pp below the trailing-7 median. Cheap insurance against silent classifier regressions across model snapshot bumps. |
Order matters in one place: event refresh must complete before
factory run, otherwise the day’s events haven’t had a chance to
invalidate stale theses before the open phase considers new ones.
Recipe 1: cron (Linux / macOS)
Section titled “Recipe 1: cron (Linux / macOS)”A wrapper script makes the cron entries readable and gives one place to
manage PATH, the timezone, and the path to your editable install.
~/.cents/bin/cents-wrap (mark it executable with chmod +x):
#!/usr/bin/env bash# Wrapper so cron / launchd dispatch to the right cents binary# regardless of which working tree is currently checked out.set -euo pipefail
# Anchor schedules to US market hours even on a non-US machine.export TZ="America/New_York"
# Use the venv's cents directly — never rely on PATH lookup from cron.# Adjust to wherever your editable install lives.CENTS_BIN="$HOME/.venvs/cents/bin/cents"
# API keys: read from ~/.cents/config.toml at runtime. If you'd rather# keep them out of the config file, export them here instead:# export ANTHROPIC_API_KEY="..."# export FMP_API_KEY="..."
# Per-day log directory.LOG_DIR="$HOME/.cents/logs/$(date +%Y-%m-%d)"mkdir -p "$LOG_DIR"
CMD_NAME="$1"; shiftexec "$CENTS_BIN" "$@" >>"$LOG_DIR/${CMD_NAME}.log" 2>&1Then crontab -e:
# m h dom mon dow command# Anchor cron itself to ET so the hour fields below mean what they look like.TZ=America/New_York
# 06:00 ET — pull policy events, fire premise-invalidation alerts.0 6 * * 1-5 ~/.cents/bin/cents-wrap event-refresh event refresh
# 06:30 ET — LLM arm. Cost-capped so a hung loop can't run unbounded charges.30 6 * * 1-5 ~/.cents/bin/cents-wrap factory-llm factory run --max-cost-usd 10
# 06:35 ET — control arm. Same cadence, different orchestrator, fixed seed for repro.35 6 * * 1-5 ~/.cents/bin/cents-wrap factory-random factory run --orchestrator random --orchestrator-seed 42
# 07:00 ET — backfill 30-day forward returns for rejected candidates.0 7 * * 1-5 ~/.cents/bin/cents-wrap shadow shadow backfill --horizon 30
# Sunday 04:00 ET — refresh the delistings table.0 4 * * 0 ~/.cents/bin/cents-wrap delistings universe ingest-delistings
# 18:00 ET — daily experiment-status snapshot for trend-watching.0 18 * * 1-5 ~/.cents/bin/cents-wrap status experiment status --output json
# 18:30 ET — eval harness against golden sets + drift check.# Persists today's metrics, then fires MODEL_DRIFT alert if F1 fell >5pp# below the trailing-7 median. Cheap insurance against silent classifier# regressions when the upstream Haiku snapshot bumps.30 18 * * 1-5 ~/.cents/bin/cents-wrap eval-run eval run --persist-history35 18 * * 1-5 ~/.cents/bin/cents-wrap eval-drift eval drift-checkLogs land at ~/.cents/logs/YYYY-MM-DD/<command>.log. The per-day
directory makes failed-day archaeology trivial (ls ~/.cents/logs/ is
your run history).
Recipe 2: launchd (macOS)
Section titled “Recipe 2: launchd (macOS)”launchd keeps running across reboots and doesn’t need a logged-in
shell. Drop these under ~/Library/LaunchAgents/ and load each one
with launchctl load -w <plist>.
~/Library/LaunchAgents/ai.dollars-and-cents.factory-llm.plist:
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"><plist version="1.0"><dict> <key>Label</key> <string>ai.dollars-and-cents.factory-llm</string>
<key>ProgramArguments</key> <array> <string>/Users/YOU/.cents/bin/cents-wrap</string> <string>factory-llm</string> <string>factory</string> <string>run</string> <string>--max-cost-usd</string> <string>10</string> </array>
<!-- 06:30, weekdays only. launchd has no TZ key — the wrapper handles ET. --> <key>StartCalendarInterval</key> <array> <dict><key>Weekday</key><integer>1</integer><key>Hour</key><integer>6</integer><key>Minute</key><integer>30</integer></dict> <dict><key>Weekday</key><integer>2</integer><key>Hour</key><integer>6</integer><key>Minute</key><integer>30</integer></dict> <dict><key>Weekday</key><integer>3</integer><key>Hour</key><integer>6</integer><key>Minute</key><integer>30</integer></dict> <dict><key>Weekday</key><integer>4</integer><key>Hour</key><integer>6</integer><key>Minute</key><integer>30</integer></dict> <dict><key>Weekday</key><integer>5</integer><key>Hour</key><integer>6</integer><key>Minute</key><integer>30</integer></dict> </array>
<!-- Don't run on load — only when the calendar interval triggers. --> <key>RunAtLoad</key> <false/>
<key>StandardOutPath</key> <string>/Users/YOU/.cents/logs/launchd-factory-llm.out</string> <key>StandardErrorPath</key> <string>/Users/YOU/.cents/logs/launchd-factory-llm.err</string>
<key>EnvironmentVariables</key> <dict> <key>TZ</key> <string>America/New_York</string> <key>PATH</key> <string>/usr/local/bin:/usr/bin:/bin</string> </dict></dict></plist>Make one plist per scheduled command — factory-llm, factory-random,
event-refresh, shadow-backfill, delistings, experiment-status —
varying the Label, ProgramArguments, StartCalendarInterval, and
log paths. The wrapper script handles the rest.
Load and verify:
launchctl load -w ~/Library/LaunchAgents/ai.dollars-and-cents.factory-llm.plistlaunchctl list | grep dollars-and-cents# To unload (e.g. before editing the plist):launchctl unload ~/Library/LaunchAgents/ai.dollars-and-cents.factory-llm.plistThe -w flag persists the load state across reboots.
Common pitfalls
Section titled “Common pitfalls”Verifying it’s working
Section titled “Verifying it’s working”Five quick checks. Run them on day 2 of any new schedule, before you trust it to run unattended for 90.
# 1. The latest run actually executed and produced output.tail -F ~/.cents/logs/$(date +%Y-%m-%d)/factory-llm.log
# 2. Both arms are accumulating theses against the active experiment.cents experiment status --output json | jq .opened_by_arm# Expect something like: { "llm": 12, "random": 11 } — counts should grow daily.
# 3. Premise invalidation is firing when events match an open thesis.cents alert list --unread
# 4. The labeled-outcomes dataset is populating across discovery / cohort / regime.cents factory analyze --by cohortcents factory analyze --by discovery,regime
# 5. Spend is where you expect — well under any --max-cost-usd cap.cents usage headroom# Expect: "Status: ok". "approaching_cap" or "hit_cap" means tune the cap up.# Per-agent attribution: cents usage summary --by agentIf opened_by_arm shows only "llm", the random-orchestrator cron
entry isn’t firing — check its log, then check that the experiment is
still active (cents experiment list).
How this connects to the rest of the docs
Section titled “How this connects to the rest of the docs”The schedule produces the labeled outcomes dataset. Whether you should trust the cohort numbers it generates is a separate question, answered by Scope — survivorship coverage, lookahead audit results, calibration honesty, and the gates the engine deliberately does not apply. Read that page before you start drawing conclusions from whatever the 90-day window leaves behind.
Troubleshooting — what to do when things break
Section titled “Troubleshooting — what to do when things break”A 30-90 day cron-driven pilot will hit failure modes you won’t see in a one-shot run. Here’s what to expect and how to handle each.
A run hangs longer than ~15 minutes
Section titled “A run hangs longer than ~15 minutes”The factory engine has two layers of timeout protection (see CLAUDE.md):
- Per-Anthropic-call: 30s read timeout (overridable via
CENTS_ANTHROPIC_TIMEOUT_SEC). Worst case ~106s per call after retries + backoff. - Per-symbol watchdog: 90s deadline on the entire orchestrator-research
call (overridable via
CENTS_PER_SYMBOL_DEADLINE_SEC). On expiry the symbol is logged + skipped viasymbols_timed_outinsummary_json; the run continues.
So a run that hangs >15 min is almost certainly something the watchdogs can’t reach — NewsAPI / FMP / Alpaca network stall, a database lock, or a wedged file handle. Identify with:
# Is the process active?ps aux | grep "cents factory"
# What sockets does it hold? (open TCP connections often surface the culprit)lsof -p $(pgrep -f "cents factory") | grep TCP
# What was the last LLM activity?sqlite3 ~/.cents/data/cents.db \ "SELECT operation, MAX(called_at) FROM llm_usage GROUP BY operation"If hung >30 min, SIGINT (Ctrl-C / kill -INT). The engine catches
KeyboardInterrupt cleanly and persists the partial run with error
set in factory_runs. The cron should not retry the same minute.
Cost cap hit mid-loop — partial runs and cohort bias
Section titled “Cost cap hit mid-loop — partial runs and cohort bias”When pre-call cost estimation projects the next LLM call would exceed
--max-cost-usd (per-run) or max_llm_spend_usd_per_day (daily),
CostCapExceeded is raised and the engine aborts. The partial run is
persisted with error = "cost_cap_exceeded".
This biases the cohort. Symbols evaluated before the abort tend to be
the most conviction-heavy (since the engine processes the universe in
shuffled order and the first 5 above-threshold names cost the most LLM
time on premise classification). The engine does NOT keep a cursor
between runs — the next day’s factory run reshuffles the universe with
a fresh seed and walks it from the top, so symbols that got cut off
yesterday are not preferentially picked up today. Net effect: days that
hit the cap contribute fewer opens than days that finished, and the
opens they DO contribute skew toward the high-conviction tail. Operators
reading cents factory analyze --by discovery will not see this — it
shows in cents factory analyze --by orchestrator only as a thinner
LLM-arm opened count on bad-cap days.
For the 30-day pilot, pick ONE of these strategies before the first scheduled run:
- Size the cap from a wide dry-run first. Run
cents factory run --dry-runover the actual pilot universe and watch the projected cost. Set--max-cost-usd~20% above the observed peak so noise won’t trip it. - Set the per-run cap loosely and rely on the daily cap as the real ceiling. This trades a chance of one expensive run for guaranteed full-universe coverage on the days that complete; the daily cap still bounds total spend if something pathological happens.
If you see frequent cost-cap aborts mid-pilot:
- Raise the cap (a $10/day ceiling is well above observed ~$0.20/run)
- Move to a tighter universe so per-run cost is bounded
- Investigate why one run blew past expectations (
cents usage summary --by operation --since 1dwill show which agent spiked)
Anthropic / FMP / NewsAPI outage
Section titled “Anthropic / FMP / NewsAPI outage”cents agents are designed to fail soft:
- Anthropic outage: sentiment falls back to keyword scoring; premise
classifier returns
[](the no-thesis path keeps tagger_failed events surfaced asWARNINGlogs so it isn’t silently dropped). Hit rate will look worse for that day but the run completes. - FMP outage: fundamentals + moat agents return zero signal for the affected symbols. Affected symbols silently drop out of the cohort — not great but bounded.
- NewsAPI outage: sentiment returns zero signal. Same shape as FMP.
- Alpaca outage: the run will hang on
get_latest_pricecalls; watchdog catches at 90s. Symbols with bad price data are skipped.
The daily eval-harness drift check (cents eval drift-check) is the
backstop — if F1 falls >5pp below the trailing-7 median, a MODEL_DRIFT
alert fires regardless of which API was the root cause.
Auditing a failed or weird run
Section titled “Auditing a failed or weird run”Every factory run writes a row to factory_runs:
sqlite3 ~/.cents/data/cents.db \ "SELECT id, started_at, completed_at, theses_opened, llm_cost_usd, error, summary_json FROM factory_runs ORDER BY started_at DESC LIMIT 5"For the most-recent run, look at:
error— populated on cost-cap, exception, or kill-signal abortssummary_json.stop_reason—max_new_per_run(normal),end_of_universe(cap raised too high or universe shrunk),cost_cap(see above),kill_switch(future use)summary_json.symbols_timed_out— non-zero means the per-symbol watchdog fired; check logs for which symbols- LLM call provenance via
cents evidence trace <evidence_id>— reconstructs the prompt/response/cost for any specific decision
Recommended cron-wrapper alerting
Section titled “Recommended cron-wrapper alerting”A bare cron entry is silent on failure. Wrap each run in a script that captures exit status and surfaces anomalies:
#!/usr/bin/env bashset -o pipefail
LOG_DIR=~/.cents/logs/$(date +%Y-%m-%d)mkdir -p "$LOG_DIR"
LLM_LOG="$LOG_DIR/factory-llm.log"RANDOM_LOG="$LOG_DIR/factory-random.log"
cents factory run --max-cost-usd 10.00 > "$LLM_LOG" 2>&1 \
cents factory run --orchestrator random \ --orchestrator-seed "$(date +%s)" > "$RANDOM_LOG" 2>&1 \
# Drift check at the end — emails on regressioncents eval run --persist-history --gate \ --baseline-f1 0.66 --baseline-brier 0.06 --tolerance-pp 3 \Replace mail with whatever notification channel you use (Slack
webhook, ntfy, healthchecks.io ping, etc.).
Anthropic model snapshot bumps mid-pilot
Section titled “Anthropic model snapshot bumps mid-pilot”Every eval call records the upstream model_snapshot (e.g.
claude-haiku-4-5-20260301). When Anthropic ships a new minor version of
the Haiku family mid-pilot, that string changes and cents eval run
will start scoring against an effectively different classifier — the
gold-set numbers from before the bump are no longer comparable to the
numbers after it. cents eval drift-check will likely fire a
MODEL_DRIFT alert on the first post-bump day because F1 has shifted
relative to the trailing-7 median. This is expected, not a
regression — but the runbook is different from “actual drift” and the
two look identical at first glance.
When a MODEL_DRIFT alert fires:
-
Check whether
model_snapshotchanged in the trailing window:Terminal window sqlite3 ~/.cents/data/cents.db \"SELECT DISTINCT model, COUNT(*) FROM llm_usageWHERE called_at >= datetime('now', '-7 days')GROUP BY model ORDER BY MIN(called_at)"If you see two distinct snapshot strings in the trailing 7 days, the bump is the cause.
-
If the snapshot changed: reset the baseline so subsequent days compare apples to apples, and clear the trailing history so the pre-bump days don’t pollute the median:
Terminal window # Re-run against the new snapshot and persist a fresh baseline.cents eval run --persist-baseline# Archive then clear the pre-bump history so drift-check has a clean window.mv ~/.cents/data/eval_history ~/.cents/data/eval_history-pre-$(date +%Y%m%d)mkdir -p ~/.cents/data/eval_historyRecord the reset date and the old/new snapshot strings in the experiment notes — that’s the cohort-analytics record that someone reading the pilot results six months later will need to interpret a step change in eval metrics partway through the window.
-
If the snapshot did NOT change: this is real drift. Investigate (upstream API behaviour, gold-set data, recently-merged classifier prompt changes) before re-baselining.
Plan for at least one such bump in any 30-day pilot — Anthropic typically refreshes Haiku snapshots monthly. The eval harness staying green across the bump is what makes the rest of the experiment results defensible.
Backing up pilot data
Section titled “Backing up pilot data”The 30-day pilot’s entire experiment dataset — theses, evidence,
llm_usage, outcomes, experiment registry, eval history — lives in a
single SQLite file at ~/.cents/data/cents.db. One disk failure,
accidental rm, or filesystem corruption is total loss of the labeled
outcomes dataset and there is no way to reconstruct it after the fact.
Back it up daily, separately from the factory cron.
# Daily SQLite backup. The .backup pragma is the ONLY safe way — a plain# cp while a factory run is mid-flight can produce a corrupt copy because# WAL pages may be in flight. .backup acquires the right locks and writes# a consistent snapshot.mkdir -p ~/.cents/data/backupssqlite3 ~/.cents/data/cents.db \ ".backup ~/.cents/data/backups/cents-$(date +%Y%m%d).db"
# Optional: retain the trailing 14 days, drop everything older.find ~/.cents/data/backups/ -name 'cents-*.db' -mtime +14 -deleteSchedule the backup at least 2 hours offset from the factory cron
(e.g. factory at 06:30 ET, backup at 08:30 ET) so the backup never runs
mid-factory run. As a cron line:
# 08:30 ET — daily SQLite backup, 2h after the factory window closes.30 8 * * 1-5 ~/.cents/bin/cents-wrap backup-db bash -c 'mkdir -p ~/.cents/data/backups && sqlite3 ~/.cents/data/cents.db ".backup ~/.cents/data/backups/cents-$(date +\%Y\%m\%d).db" && find ~/.cents/data/backups/ -name "cents-*.db" -mtime +14 -delete'Out of scope for this recipe: off-host backup (S3 / B2 / a second disk),
which the operator can layer on top by rsyncing
~/.cents/data/backups/ to whatever they have. Also out of scope: an
automated restore-from-backup smoke test — operators should manually
verify the latest backup opens cleanly with sqlite3 <backup> '.schema'
at least once per pilot.
What is NOT auto-recoverable
Section titled “What is NOT auto-recoverable”- Database file corruption — see the “Backing up pilot data” section
above; daily
.backupsnapshots are the only defence. - Manual edits to
factory.tomlmid-experiment (the SHA freeze incents experiment registerdoesn’t yet detect drift — see beadcents-eat0) - Loss of the
~/.cents/data/llm_calls/blob directory — provenance reconstruction will fail for affected evidence rows. The blob directory is not currently covered by thesqlite3 .backuprecipe;rsyncit separately if provenance reconstruction matters for your pilot.