Cobb-Douglas AI
Sketch · In ProgressThe instinct, when adding a new input, is to bolt a third exponent onto Cobb-Douglas: Y = A · Lα · Kβ · Tγ where T is tokens, or tasks, or "intelligence-on-demand". It's tempting and it's almost certainly wrong in a way worth being precise about. The interesting question isn't whether T belongs in the function — it's how, and that depends on the regime of AI capability you're modelling. I see two avenues, and most of the current debate confuses them.
Cobb-Douglas assumes a constant elasticity of substitution between every pair of factors, and that elasticity equals 1. That assumption breaks immediately for AI. AI is a near-perfect substitute for labour in some tasks (ticket triage, first-draft copy, boilerplate code) and a near-zero substitute in others (judgment under genuine ambiguity, trust-laden sales conversations, escalations). Compressing that range into a single γ is dishonest.
Two more honest forms: (a) nested CES with task-level elasticities — σ < 1 between AI and human labour within a task, σ > 1 across the bundle; or (b) a task-based production function in the Acemoglu-Restrepo tradition, built up from tasks rather than imposed from above. I'll keep using "Cobb-Douglas" as shorthand because the α, β, γ labels are useful, but the functional form is a fiction. What matters is the story about the exponents.
In this regime AI isn't a factor at all. It's a technology shifter — either inside the Hicks-neutral A term, or, more usefully, a multiplier φ on the effective units of labour:
Y = A · (φ · L)α · Kβ, φ ≥ 1
Tokens get bought; the buying just lets each worker get more done per hour. Model and inference flow are a complement to labour, not a substitute. This is where most of the current empirical evidence actually sits.
- Brynjolfsson, Li & Raymond (2023) — generative AI in customer support raised resolutions per hour by ~14%. Crucially, the largest gains went to novice workers. The model didn't replace them; it compressed the experience curve. NBER WP 31161.
- Noy & Zhang (2023, Science) — writing tasks completed ~40% faster with ChatGPT, output quality steady-to-improved. Same shape: augmentation, not substitution.
- Peng et al. (2023) — GitHub Copilot, ~55% speedup on a controlled coding task. Developer remains in the loop throughout.
- Anthropic Economic Index (Feb 2025) — task-level decomposition showing the augmentation share materially exceeds the automation share across most occupations sampled in the first release.
The policy-relevant question inside Avenue A is the distribution of φ. Compression (novices catch up) or bifurcation (experts pull further away)? The evidence leans compression — which is itself a real result, and an uncomfortable one for the "AI will hollow out the middle" narrative that dominated 2023-2024.
"AI as A-shifter" elides cost. Tokens have real marginal cost — paid per task, in opex, often a non-trivial fraction of gross margin. Modelling φ as pure TFP misrepresents the income statement. The cleaner formulation has each unit of task-labour as a bundle of "human-time + AI-tokens" jointly, with the bundle entering as effective L. That matters because it changes what the firm is optimising: not "how productive is my worker?" but "what's the cheapest mix to complete this task?"
When the agent runs autonomously — closes the ticket, writes the function, drafts the contract, joins the meeting — the production function genuinely changes shape and AI earns its own exponent. But "AI as capital" needs care, because three very different things get conflated under that label:
A Series B SaaS firm spending $400K/month on Anthropic and OpenAI isn't holding capital in the sense a factory holds a CNC machine. It's buying a service flow whose underlying asset sits on someone else's balance sheet. So the sharper Avenue B question isn't "is AI capital?" — it's "for whom is AI capital?" Foundation labs hold the capital. Customer firms buy a service that competes with labour at the task level. The aggregate story has to be assembled from both views, and a one-sector model that just swaps "AI" in for K will get the long-run incidence wrong.
The popular framing — agents are workers, agents are owned, therefore agents are capital — skips a step. What the firm owns is a contractual right to inference, not a stock of agents. Reclassifying that as capital risks importing capital-deepening dynamics (declining marginal product, accumulation paths, depreciation schedules) that don't actually apply to a service subscription. The economics of the lab and the economics of the customer firm need to be modelled separately before they're aggregated.
Here's where the question gets sharp. Picture a firm with a monthly token budget B and a headcount L. B triples over 18 months. What else changes is the tell:
Now the tension the user is pointing at. The accounting tells one story; the hiring decisions tell another. Token spend is booked as opex — cost of revenue, often — which behaves like a variable input. Labour-adjacent. But the headcount decisions — slowed hiring in customer support, junior copywriting, paralegal review, BPO-style code maintenance — behave like capital-labour substitution at a long-run margin.
Firms are treating AI as labour on the income statement and as capital in the org chart.
That gap is the research question. The accounting choice isn't a typo — opex is genuinely the right book entry for an inference subscription. But the decision it enables looks much more like the kind of multi-year factor swap that capital deepening describes. The two views can both be right at the same time, and that's what makes the dynamics worth modelling separately from either pure Avenue A or pure Avenue B.
My current hypothesis: firms haven't decided which lens they're operating in, and the inconsistency between the two is doing real damage to workforce planning. Token spend grows because each engineer "needs" it (Avenue A logic — productivity tool), and headcount freezes happen in parallel (Avenue B logic — substitute available). Both decisions feel locally rational. The aggregate is incoherent.
? Open Questions · What Would DiscriminateEmpirical work I'd want to see, roughly in order of tractability:
- Token spend per FTE across cohorts of comparable firms, correlated with role-level hiring slowdowns. Cross-firm panel, plausibly assemblable from public filings + LinkedIn data + provider disclosure.
- Within-firm task allocation — are tokens deployed in support of existing workers (A) or to retire tasks entirely (B)? The Anthropic Economic Index gets at this at the task level; the firm-level cut is harder and more revealing.
- Wage dispersion within roles. Avenue A predicts compression (novices catch up). Avenue B predicts bifurcation (the workers who remain are the ones whose tasks resist agent substitution).
- The γ-vs-α-fall test. Functionally similar, theoretically distinct: γ rising says AI is a new input; α falling says human labour just got better. The cleanest discriminator is whether AI spend co-moves with output (γ) or substitutes for L holding output constant (α↓).
Acemoglu (2024, "The Simple Macroeconomics of AI") argues aggregate impact is modest — well under 1% TFP per decade under reasonable parameter choices. That may be right at the macro level while role-level substitution is severe. "Macro modest, micro brutal" is a real possibility, and aggregate TFP measures will badly hide it. Anyone using top-line productivity numbers to dismiss labour-market concern is reading the wrong gauge.