Skip to content

Scheduling

A pre-registered forward test only earns its labeled-outcomes dataset if the factory actually runs every trading day. This page is the operator’s recipe for that — what to schedule, what order it has to run in, and the two pitfalls (PATH and editable-install drift) that bite first.

If you haven’t pre-registered the experiment yet, do that first via cents experiment register. The schedule below assumes an active experiment is already in place.

For a two-arm forward test, six commands run on a cadence:

CommandCadenceWhy
cents event refreshDaily, before open (e.g. 06:00 ET)Pulls Federal Register events; fires PREMISE_INVALIDATION alerts against any open thesis whose premise_tags intersect an event’s tags.
cents factory runDaily, after the refresh (e.g. 06:30 ET)The LLM arm — the real multi-agent stack. Stamps orchestrator_label = "llm" on every thesis it opens.
cents factory run --orchestrator random --orchestrator-seed NDaily, same window, separate invocationThe control arm. Uniform-random conviction, no LLM calls. Stamps orchestrator_label = "random". Without this you cannot tell whether the LLM arm produced signal or rode the universe.
cents shadow backfill --horizon 30Daily (or weekly)Walks the shadow_opens table past the 30-day horizon and fills forward returns from price history. This is what gives you a baseline against rejected candidates.
cents universe ingest-delistingsWeeklyKeeps the delistings table current. Forward runs are unbiased today; the longer this has been running, the less retrospective survivorship bias bleeds into back-dated universe reconstructions. See Scope.
cents experiment status --output jsonDaily, appended to a logLets you watch verdict_ready and opened_by_arm evolve over the 90 days without having to remember to check.
cents eval run --persist-history + cents eval drift-checkDaily, after factory runsPersists today’s eval metrics and fires a MODEL_DRIFT alert if F1 falls more than 5pp below the trailing-7 median. Cheap insurance against silent classifier regressions across model snapshot bumps.

Order matters in one place: event refresh must complete before factory run, otherwise the day’s events haven’t had a chance to invalidate stale theses before the open phase considers new ones.

A wrapper script makes the cron entries readable and gives one place to manage PATH, the timezone, and the path to your editable install.

~/.cents/bin/cents-wrap (mark it executable with chmod +x):

#!/usr/bin/env bash
# Wrapper so cron / launchd dispatch to the right cents binary
# regardless of which working tree is currently checked out.
set -euo pipefail
# Anchor schedules to US market hours even on a non-US machine.
export TZ="America/New_York"
# Use the venv's cents directly — never rely on PATH lookup from cron.
# Adjust to wherever your editable install lives.
CENTS_BIN="$HOME/.venvs/cents/bin/cents"
# API keys: read from ~/.cents/config.toml at runtime. If you'd rather
# keep them out of the config file, export them here instead:
# export ANTHROPIC_API_KEY="..."
# export FMP_API_KEY="..."
# Per-day log directory.
LOG_DIR="$HOME/.cents/logs/$(date +%Y-%m-%d)"
mkdir -p "$LOG_DIR"
CMD_NAME="$1"; shift
exec "$CENTS_BIN" "$@" >>"$LOG_DIR/${CMD_NAME}.log" 2>&1

Then crontab -e:

Terminal window
# m h dom mon dow command
# Anchor cron itself to ET so the hour fields below mean what they look like.
TZ=America/New_York
# 06:00 ET — pull policy events, fire premise-invalidation alerts.
0 6 * * 1-5 ~/.cents/bin/cents-wrap event-refresh event refresh
# 06:30 ET — LLM arm. Cost-capped so a hung loop can't run unbounded charges.
30 6 * * 1-5 ~/.cents/bin/cents-wrap factory-llm factory run --max-cost-usd 10
# 06:35 ET — control arm. Same cadence, different orchestrator, fixed seed for repro.
35 6 * * 1-5 ~/.cents/bin/cents-wrap factory-random factory run --orchestrator random --orchestrator-seed 42
# 07:00 ET — backfill 30-day forward returns for rejected candidates.
0 7 * * 1-5 ~/.cents/bin/cents-wrap shadow shadow backfill --horizon 30
# Sunday 04:00 ET — refresh the delistings table.
0 4 * * 0 ~/.cents/bin/cents-wrap delistings universe ingest-delistings
# 18:00 ET — daily experiment-status snapshot for trend-watching.
0 18 * * 1-5 ~/.cents/bin/cents-wrap status experiment status --output json
# 18:30 ET — eval harness against golden sets + drift check.
# Persists today's metrics, then fires MODEL_DRIFT alert if F1 fell >5pp
# below the trailing-7 median. Cheap insurance against silent classifier
# regressions when the upstream Haiku snapshot bumps.
30 18 * * 1-5 ~/.cents/bin/cents-wrap eval-run eval run --persist-history
35 18 * * 1-5 ~/.cents/bin/cents-wrap eval-drift eval drift-check

Logs land at ~/.cents/logs/YYYY-MM-DD/<command>.log. The per-day directory makes failed-day archaeology trivial (ls ~/.cents/logs/ is your run history).

launchd keeps running across reboots and doesn’t need a logged-in shell. Drop these under ~/Library/LaunchAgents/ and load each one with launchctl load -w <plist>.

~/Library/LaunchAgents/ai.dollars-and-cents.factory-llm.plist:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>ai.dollars-and-cents.factory-llm</string>
<key>ProgramArguments</key>
<array>
<string>/Users/YOU/.cents/bin/cents-wrap</string>
<string>factory-llm</string>
<string>factory</string>
<string>run</string>
<string>--max-cost-usd</string>
<string>10</string>
</array>
<!-- 06:30, weekdays only. launchd has no TZ key — the wrapper handles ET. -->
<key>StartCalendarInterval</key>
<array>
<dict><key>Weekday</key><integer>1</integer><key>Hour</key><integer>6</integer><key>Minute</key><integer>30</integer></dict>
<dict><key>Weekday</key><integer>2</integer><key>Hour</key><integer>6</integer><key>Minute</key><integer>30</integer></dict>
<dict><key>Weekday</key><integer>3</integer><key>Hour</key><integer>6</integer><key>Minute</key><integer>30</integer></dict>
<dict><key>Weekday</key><integer>4</integer><key>Hour</key><integer>6</integer><key>Minute</key><integer>30</integer></dict>
<dict><key>Weekday</key><integer>5</integer><key>Hour</key><integer>6</integer><key>Minute</key><integer>30</integer></dict>
</array>
<!-- Don't run on load — only when the calendar interval triggers. -->
<key>RunAtLoad</key>
<false/>
<key>StandardOutPath</key>
<string>/Users/YOU/.cents/logs/launchd-factory-llm.out</string>
<key>StandardErrorPath</key>
<string>/Users/YOU/.cents/logs/launchd-factory-llm.err</string>
<key>EnvironmentVariables</key>
<dict>
<key>TZ</key>
<string>America/New_York</string>
<key>PATH</key>
<string>/usr/local/bin:/usr/bin:/bin</string>
</dict>
</dict>
</plist>

Make one plist per scheduled command — factory-llm, factory-random, event-refresh, shadow-backfill, delistings, experiment-status — varying the Label, ProgramArguments, StartCalendarInterval, and log paths. The wrapper script handles the rest.

Load and verify:

Terminal window
launchctl load -w ~/Library/LaunchAgents/ai.dollars-and-cents.factory-llm.plist
launchctl list | grep dollars-and-cents
# To unload (e.g. before editing the plist):
launchctl unload ~/Library/LaunchAgents/ai.dollars-and-cents.factory-llm.plist

The -w flag persists the load state across reboots.

Five quick checks. Run them on day 2 of any new schedule, before you trust it to run unattended for 90.

Terminal window
# 1. The latest run actually executed and produced output.
tail -F ~/.cents/logs/$(date +%Y-%m-%d)/factory-llm.log
# 2. Both arms are accumulating theses against the active experiment.
cents experiment status --output json | jq .opened_by_arm
# Expect something like: { "llm": 12, "random": 11 } — counts should grow daily.
# 3. Premise invalidation is firing when events match an open thesis.
cents alert list --unread
# 4. The labeled-outcomes dataset is populating across discovery / cohort / regime.
cents factory analyze --by cohort
cents factory analyze --by discovery,regime
# 5. Spend is where you expect — well under any --max-cost-usd cap.
cents usage headroom
# Expect: "Status: ok". "approaching_cap" or "hit_cap" means tune the cap up.
# Per-agent attribution: cents usage summary --by agent

If opened_by_arm shows only "llm", the random-orchestrator cron entry isn’t firing — check its log, then check that the experiment is still active (cents experiment list).

The schedule produces the labeled outcomes dataset. Whether you should trust the cohort numbers it generates is a separate question, answered by Scope — survivorship coverage, lookahead audit results, calibration honesty, and the gates the engine deliberately does not apply. Read that page before you start drawing conclusions from whatever the 90-day window leaves behind.

Troubleshooting — what to do when things break

Section titled “Troubleshooting — what to do when things break”

A 30-90 day cron-driven pilot will hit failure modes you won’t see in a one-shot run. Here’s what to expect and how to handle each.

The factory engine has two layers of timeout protection (see CLAUDE.md):

  • Per-Anthropic-call: 30s read timeout (overridable via CENTS_ANTHROPIC_TIMEOUT_SEC). Worst case ~106s per call after retries + backoff.
  • Per-symbol watchdog: 90s deadline on the entire orchestrator-research call (overridable via CENTS_PER_SYMBOL_DEADLINE_SEC). On expiry the symbol is logged + skipped via symbols_timed_out in summary_json; the run continues.

So a run that hangs >15 min is almost certainly something the watchdogs can’t reach — NewsAPI / FMP / Alpaca network stall, a database lock, or a wedged file handle. Identify with:

Terminal window
# Is the process active?
ps aux | grep "cents factory"
# What sockets does it hold? (open TCP connections often surface the culprit)
lsof -p $(pgrep -f "cents factory") | grep TCP
# What was the last LLM activity?
sqlite3 ~/.cents/data/cents.db \
"SELECT operation, MAX(called_at) FROM llm_usage GROUP BY operation"

If hung >30 min, SIGINT (Ctrl-C / kill -INT). The engine catches KeyboardInterrupt cleanly and persists the partial run with error set in factory_runs. The cron should not retry the same minute.

Cost cap hit mid-loop — partial runs and cohort bias

Section titled “Cost cap hit mid-loop — partial runs and cohort bias”

When pre-call cost estimation projects the next LLM call would exceed --max-cost-usd (per-run) or max_llm_spend_usd_per_day (daily), CostCapExceeded is raised and the engine aborts. The partial run is persisted with error = "cost_cap_exceeded".

This biases the cohort. Symbols evaluated before the abort tend to be the most conviction-heavy (since the engine processes the universe in shuffled order and the first 5 above-threshold names cost the most LLM time on premise classification). The engine does NOT keep a cursor between runs — the next day’s factory run reshuffles the universe with a fresh seed and walks it from the top, so symbols that got cut off yesterday are not preferentially picked up today. Net effect: days that hit the cap contribute fewer opens than days that finished, and the opens they DO contribute skew toward the high-conviction tail. Operators reading cents factory analyze --by discovery will not see this — it shows in cents factory analyze --by orchestrator only as a thinner LLM-arm opened count on bad-cap days.

For the 30-day pilot, pick ONE of these strategies before the first scheduled run:

  • Size the cap from a wide dry-run first. Run cents factory run --dry-run over the actual pilot universe and watch the projected cost. Set --max-cost-usd ~20% above the observed peak so noise won’t trip it.
  • Set the per-run cap loosely and rely on the daily cap as the real ceiling. This trades a chance of one expensive run for guaranteed full-universe coverage on the days that complete; the daily cap still bounds total spend if something pathological happens.

If you see frequent cost-cap aborts mid-pilot:

  • Raise the cap (a $10/day ceiling is well above observed ~$0.20/run)
  • Move to a tighter universe so per-run cost is bounded
  • Investigate why one run blew past expectations (cents usage summary --by operation --since 1d will show which agent spiked)

cents agents are designed to fail soft:

  • Anthropic outage: sentiment falls back to keyword scoring; premise classifier returns [] (the no-thesis path keeps tagger_failed events surfaced as WARNING logs so it isn’t silently dropped). Hit rate will look worse for that day but the run completes.
  • FMP outage: fundamentals + moat agents return zero signal for the affected symbols. Affected symbols silently drop out of the cohort — not great but bounded.
  • NewsAPI outage: sentiment returns zero signal. Same shape as FMP.
  • Alpaca outage: the run will hang on get_latest_price calls; watchdog catches at 90s. Symbols with bad price data are skipped.

The daily eval-harness drift check (cents eval drift-check) is the backstop — if F1 falls >5pp below the trailing-7 median, a MODEL_DRIFT alert fires regardless of which API was the root cause.

Every factory run writes a row to factory_runs:

Terminal window
sqlite3 ~/.cents/data/cents.db \
"SELECT id, started_at, completed_at, theses_opened, llm_cost_usd, error, summary_json
FROM factory_runs ORDER BY started_at DESC LIMIT 5"

For the most-recent run, look at:

  • error — populated on cost-cap, exception, or kill-signal aborts
  • summary_json.stop_reasonmax_new_per_run (normal), end_of_universe (cap raised too high or universe shrunk), cost_cap (see above), kill_switch (future use)
  • summary_json.symbols_timed_out — non-zero means the per-symbol watchdog fired; check logs for which symbols
  • LLM call provenance via cents evidence trace <evidence_id> — reconstructs the prompt/response/cost for any specific decision

A bare cron entry is silent on failure. Wrap each run in a script that captures exit status and surfaces anomalies:

cents-run-wrapped.sh
#!/usr/bin/env bash
set -o pipefail
LOG_DIR=~/.cents/logs/$(date +%Y-%m-%d)
mkdir -p "$LOG_DIR"
LLM_LOG="$LOG_DIR/factory-llm.log"
RANDOM_LOG="$LOG_DIR/factory-random.log"
cents factory run --max-cost-usd 10.00 > "$LLM_LOG" 2>&1 \
|| mail -s "[cents] LLM-arm run FAILED $(date)" [email protected] < "$LLM_LOG"
cents factory run --orchestrator random \
--orchestrator-seed "$(date +%s)" > "$RANDOM_LOG" 2>&1 \
|| mail -s "[cents] random-arm run FAILED $(date)" [email protected] < "$RANDOM_LOG"
# Drift check at the end — emails on regression
cents eval run --persist-history --gate \
--baseline-f1 0.66 --baseline-brier 0.06 --tolerance-pp 3 \
|| mail -s "[cents] eval drift detected $(date)" [email protected] < /dev/null

Replace mail with whatever notification channel you use (Slack webhook, ntfy, healthchecks.io ping, etc.).

Every eval call records the upstream model_snapshot (e.g. claude-haiku-4-5-20260301). When Anthropic ships a new minor version of the Haiku family mid-pilot, that string changes and cents eval run will start scoring against an effectively different classifier — the gold-set numbers from before the bump are no longer comparable to the numbers after it. cents eval drift-check will likely fire a MODEL_DRIFT alert on the first post-bump day because F1 has shifted relative to the trailing-7 median. This is expected, not a regression — but the runbook is different from “actual drift” and the two look identical at first glance.

When a MODEL_DRIFT alert fires:

  1. Check whether model_snapshot changed in the trailing window:

    Terminal window
    sqlite3 ~/.cents/data/cents.db \
    "SELECT DISTINCT model, COUNT(*) FROM llm_usage
    WHERE called_at >= datetime('now', '-7 days')
    GROUP BY model ORDER BY MIN(called_at)"

    If you see two distinct snapshot strings in the trailing 7 days, the bump is the cause.

  2. If the snapshot changed: reset the baseline so subsequent days compare apples to apples, and clear the trailing history so the pre-bump days don’t pollute the median:

    Terminal window
    # Re-run against the new snapshot and persist a fresh baseline.
    cents eval run --persist-baseline
    # Archive then clear the pre-bump history so drift-check has a clean window.
    mv ~/.cents/data/eval_history ~/.cents/data/eval_history-pre-$(date +%Y%m%d)
    mkdir -p ~/.cents/data/eval_history

    Record the reset date and the old/new snapshot strings in the experiment notes — that’s the cohort-analytics record that someone reading the pilot results six months later will need to interpret a step change in eval metrics partway through the window.

  3. If the snapshot did NOT change: this is real drift. Investigate (upstream API behaviour, gold-set data, recently-merged classifier prompt changes) before re-baselining.

Plan for at least one such bump in any 30-day pilot — Anthropic typically refreshes Haiku snapshots monthly. The eval harness staying green across the bump is what makes the rest of the experiment results defensible.

The 30-day pilot’s entire experiment dataset — theses, evidence, llm_usage, outcomes, experiment registry, eval history — lives in a single SQLite file at ~/.cents/data/cents.db. One disk failure, accidental rm, or filesystem corruption is total loss of the labeled outcomes dataset and there is no way to reconstruct it after the fact. Back it up daily, separately from the factory cron.

Terminal window
# Daily SQLite backup. The .backup pragma is the ONLY safe way — a plain
# cp while a factory run is mid-flight can produce a corrupt copy because
# WAL pages may be in flight. .backup acquires the right locks and writes
# a consistent snapshot.
mkdir -p ~/.cents/data/backups
sqlite3 ~/.cents/data/cents.db \
".backup ~/.cents/data/backups/cents-$(date +%Y%m%d).db"
# Optional: retain the trailing 14 days, drop everything older.
find ~/.cents/data/backups/ -name 'cents-*.db' -mtime +14 -delete

Schedule the backup at least 2 hours offset from the factory cron (e.g. factory at 06:30 ET, backup at 08:30 ET) so the backup never runs mid-factory run. As a cron line:

Terminal window
# 08:30 ET — daily SQLite backup, 2h after the factory window closes.
30 8 * * 1-5 ~/.cents/bin/cents-wrap backup-db bash -c 'mkdir -p ~/.cents/data/backups && sqlite3 ~/.cents/data/cents.db ".backup ~/.cents/data/backups/cents-$(date +\%Y\%m\%d).db" && find ~/.cents/data/backups/ -name "cents-*.db" -mtime +14 -delete'

Out of scope for this recipe: off-host backup (S3 / B2 / a second disk), which the operator can layer on top by rsyncing ~/.cents/data/backups/ to whatever they have. Also out of scope: an automated restore-from-backup smoke test — operators should manually verify the latest backup opens cleanly with sqlite3 <backup> '.schema' at least once per pilot.

  • Database file corruption — see the “Backing up pilot data” section above; daily .backup snapshots are the only defence.
  • Manual edits to factory.toml mid-experiment (the SHA freeze in cents experiment register doesn’t yet detect drift — see bead cents-eat0)
  • Loss of the ~/.cents/data/llm_calls/ blob directory — provenance reconstruction will fail for affected evidence rows. The blob directory is not currently covered by the sqlite3 .backup recipe; rsync it separately if provenance reconstruction matters for your pilot.
Not financial advice. Cents is an educational and research tool for tracking your own investment theses. Outputs are model-generated and may be inaccurate. You are solely responsible for your own investment decisions.