The Thesis
Traditional quant excels at structured data — correlations, volatility, mean reversion. But markets increasingly move on narrative. A single policy shift. A change in AI sentiment. A geopolitical escalation.
That signal lives in language, not spreadsheets. LLMs can process it. Quant models structurally cannot.
In a traditional quant fund, math generates the signals and humans manage the risk. Solomon flips the stack. The AI is the edge. The quant layer serves as guardrails, not signal generators.
This mirrors how the best discretionary macro funds operate: deep contextual reasoning about the world, constrained by systematic risk management. Solomon automates the reasoning layer while keeping the risk layer deterministic and auditable.
The Council
A single LLM prompt produces a single opinion. Solomon splits reasoning across multiple specialized agents with structural biases, creating adversarial tension that mirrors real fund governance.
A PM wants to buy. Risk wants to sell. The CIO decides within policy constraints. Except the seats are filled by AI, the constraints are enforced by deterministic code, and every decision is journaled.
Models are tiered by function, not thrown at the problem uniformly. Scouts use fast, cheap models. Reasoning agents use mid-tier. The veto seat gets the flagship. The most consequential decision gets the best brain.
The Math
The quant layer doesn't generate alpha. It prevents ruin. Every conviction the AI council produces passes through deterministic risk infrastructure before a single dollar moves.
Hard guardrails on conviction changes, position sizing, turnover, and cash reserves. The Fund Manager and Execution Strategist are pure deterministic code — no LLM can override the risk constraints. The math doesn't negotiate.
8 Themes
Solomon invests thematically, not ticker-by-ticker. Each theme carries a conviction score that determines capital allocation. Why thematic? Because that's where LLM reasoning adds value. Asking AI to predict a stock price is pointless. Asking it whether the AI infrastructure narrative is strengthening — that's worth asking.
71 tickers across 8 themes. The AI council adjusts conviction each session. Higher conviction, more capital deployed.
The Battle
Two identical instances. Same architecture. Same themes. Same data. Same guardrails. Same prompts. The only variable is the LLM brain.
Models matched by function — flagship vs flagship on the veto seat, mid-tier vs mid-tier on reasoning, fast vs fast on data transforms. Each instance has its own paper trading account, its own decision journal, and its own rate limiter.
This isn't a benchmark on test questions. It's a benchmark on capital allocation. Separate accounts. Real market data. Side by side. The model family that demonstrates better reasoning over time earns the right to manage real capital.
The Loop
The weekly review is where it gets recursive. Inspired by Karpathy's autoresearch — an AI system analyzing its own performance to propose improvements.
Each council's flagship model reviews its own week of decisions. What worked. What didn't. Where the agents were systematically wrong. Not just outcome analysis — reasoning analysis.