← BATTLE
ARC, Solomon

The Origin

An AI-native fund architecture where LLMs generate conviction and math manages risk. Built from zero in Claude Code. This is the thesis.

SCROLL
01

The Thesis

Traditional quant excels at structured data — correlations, volatility, mean reversion. But markets increasingly move on narrative. A single policy shift. A change in AI sentiment. A geopolitical escalation.

That signal lives in language, not spreadsheets. LLMs process it natively. Traditional quant structurally underweights it.

AI generates the signals. Math manages the risk.

In a traditional quant fund, math generates the signals and humans manage the risk. Solomon flips the stack. The AI is the edge. The quant layer serves as guardrails, not signal generators.

This mirrors how the best discretionary macro funds operate: deep contextual reasoning about the world, constrained by systematic risk management. Solomon automates the reasoning layer while keeping the risk layer deterministic and auditable.

02

The Council

A single LLM prompt produces a single opinion. Solomon splits reasoning across multiple specialized agents with structural biases, creating adversarial tension that mirrors real fund governance.

A PM wants to buy. Risk wants to sell. The CIO decides within policy constraints. Except the seats are filled by AI, the constraints are enforced by deterministic code, and every decision is journaled.

Scout
Macro Oracle
Reads the economic landscape. Interprets macro data and regime state into structured signals.
Scout
Narrative Intelligence
Detects narrative shifts in real-time. The AI-native edge — finding what quant cannot.
Advocate
Thesis Analyst
Synthesizes all signals. Structural bias: bullish. Finds reasons to deploy capital.
Adversary
Risk Sentinel
The last line of defense. Structural bias: bearish. Has the power to veto the entire session.
VETO SEAT

Models are tiered by function, not thrown at the problem uniformly. Scouts use fast, cheap models. Reasoning agents use mid-tier. The veto seat gets the flagship. The most consequential decision gets the best brain.

Each agent carries tunable behavioral parameters — sentiment bias, influence weight, conviction limits. These aren't fixed. The system calibrates them over time based on measured accuracy, turning the council from a static architecture into an adaptive one.

03

The Math

The quant layer doesn't generate alpha. It prevents ruin. Every conviction the AI council produces passes through deterministic risk infrastructure before a single dollar moves.

Regime Detection
HMM
Hidden Markov Model classifies market regime. Position sizing adjusts dynamically based on state.
Downside Risk
VaR
Value at Risk quantifies worst-case portfolio loss. Feeds the Risk Sentinel's veto calculus.
Position Sizing
Kelly
Kelly criterion determines optimal allocation given edge and variance. Prevents over-concentration.
Risk-Adjusted Return
Sharpe
Sharpe ratio measures return per unit of risk. Tracked per theme to identify which convictions pay off.

Hard guardrails on conviction changes, position sizing, turnover, and cash reserves. The Fund Manager and Execution Strategist are pure deterministic code — no LLM can override the risk constraints. The math doesn't negotiate.

04

8 Themes

Solomon invests thematically, not ticker-by-ticker. Each theme carries a conviction score that determines capital allocation. Why thematic? Because that's where LLM reasoning adds value. Asking AI to predict a stock price is pointless. Asking it whether the AI infrastructure narrative is strengthening — that's worth asking.

AI & Compute Aggressive
Energy & Grid Moderate
Defense & Aerospace Moderate
Quantum Computing Aggressive
Biotech & Health Moderate
Real Estate Moderate
Crypto & Digital Aggressive
Defensive Conservative

71 tickers across 8 themes. The AI council adjusts conviction each session. Higher conviction, more capital deployed.

05

The Battle

Two identical instances. Same architecture. Same themes. Same data. Same guardrails. Same prompts. The only variable is the LLM brain.

Anthropic
Claude Model Family
VS
OpenAI
GPT Model Family

Models matched by function — flagship vs flagship on the veto seat, mid-tier vs mid-tier on reasoning, fast vs fast on data transforms. Each instance has its own paper trading account, its own decision journal, and its own rate limiter.

This isn't a benchmark on test questions. It's a benchmark on capital allocation. Separate accounts. Real market data. Side by side. The model family that demonstrates better reasoning over time earns the right to manage real capital.

06

The Loop

The weekly review is where it gets recursive. Inspired by the autoresearch pattern — a closed feedback loop where the system reviews its own decisions, measures errors, and calibrates future sessions.

Each council's flagship model reviews its own week of decisions. What worked. What didn't. Where the agents were systematically wrong. Not just outcome analysis — reasoning analysis.

Between reviews, every session carries institutional memory. Agents see prior vetoes, contested themes, and conviction shifts — the council doesn't restart from zero each time. The Risk Sentinel knows what it flagged last session. The Thesis Analyst knows what got blocked and why.

Weeks 1–2
Clean baseline. Pure reasoning comparison. No self-modification. Establish how each model family thinks about risk, narrative, and conviction from zero.
Week 3+
Close the loop. Each council calibrates its own parameters based on realized performance. The system doesn't just decide — it learns to decide better.
Recursive Self-Calibration
Agent accuracy tracking. Systematic bias detection. The decision journal becomes the training data for the system's own improvement.
The decision journal is the dataset. It's not just about who makes more money — it's about understanding how each model thinks about risk.
07

Session Trace

Theory is cheap. Here's the system running. One council session from an early test — real market data, real agent reasoning, real veto.

Regime BEAR — 100% confidence, 214-day persistence, 99.8% stay probability
Risk Score 0.76 — Veto Territory
Risk Sentinel VETO — Portfolio structurally overweight high-beta growth in confirmed downturn. 24 holdings breaching drawdown tolerance. Conviction increases blocked.
Contested Energy & Grid (Thesis +1, Risk vetoed) · Defense (Thesis +1, Risk vetoed) · Defensive (Thesis +2, Risk vetoed)
Applied Quantum −1 · Crypto −1 · Real Estate −1. Only conviction decreases permitted.
Cost $0.39 total — Scouts $0.09, Reasoning $0.08, Veto Seat $0.22

The Thesis Analyst wanted to increase conviction on three themes. The Risk Sentinel vetoed all of them. The math didn't negotiate. This is the governance working as designed.

Built With
Co-authored with Claude Code
Every line of architecture, every agent prompt, every risk model, and this page — AI-driven development from thesis to deployment.
Watch the Battle →