4 min read

Why most AI features in startup products are theater

AI theater: when a product uses AI to perform intelligence rather than to produce it.

A working definition: AI theater is when a product uses AI to perform intelligence rather than to produce it.

The distinction is visible once you know how to look. The AI theater version has a loading animation, a confidence score displayed to four decimal places, terminology in the UI that signals ML sophistication. The actual output is something you could have generated with a simpler system or didn't need to generate at all. The AI is set dressing.

Why it happens

Investor pressure is part of it. "AI-powered" is in the pitch deck because it needs to be. Once it's in the pitch deck it needs to be in the product. The product team builds something that justifies the claim. That something is often underpowered or irrelevant, but it exists, and the claim is technically true.

The second cause is mistaking novelty for value. A feature that uses ML is interesting to build. Engineers like building interesting things. The question of whether the ML output actually changes user behavior or improves outcomes is different from the question of whether the ML is technically impressive. These can come apart, and they do, regularly.

The third cause is the demo problem. A demo of ML working on clean data in controlled conditions looks compelling. The same model running on real user data with real variability performs differently. Products get shipped based on demo performance and the production performance never gets properly measured.

The honest check

I've started applying a simple test to AI features: if the model was removed and replaced with the median human output for the same task, would the user notice? If the answer is no, the feature might be theater.

A related check: is the AI output actually influencing user decisions? Not being displayed. Not being acknowledged. Actually changing what the user does next. If the answer is no, it's ambient ornamentation.

A third check: what's the counterfactual? If we shipped this feature without the ML, with a simpler approach, would outcomes be worse? Sometimes the answer is genuinely yes. Often it isn't.

What this looks like at Zillion Pitches

We've been honest with ourselves about this. The structural analysis we do (identifying whether a pitch contains a clear problem statement, solution, and market section) is genuinely useful. Founders change their pitches based on it. We have before-and-after data.

The tone score we display is probably theater. Founders look at it. They don't change behavior based on it. It adds an appearance of depth to the report without adding depth. We've been debating whether to remove it or replace it with something we can actually demonstrate is useful.

The honest answer is that we shipped it because it looked good in the demo. That's theater by definition.

The harder question

AI theater isn't always a product failure. If it's in a feature that users genuinely enjoy even without the practical utility, the case for keeping it is weaker than if it's in the core value proposition but not zero. Delight has value.

The dangerous version is AI theater in the core value proposition. If the main thing your product does is powered by ML that isn't actually working, you have a product that depends on users not investigating closely. That works until it doesn't.

Most products I see in the Draper ecosystem have at least one AI theater feature. Several have products where the entire value proposition is theater. The market is not good at punishing this, at least in the short term, which is why it persists.

The correction usually comes at the point where someone runs the honest experiment: what if we turned this off? The answer determines whether you have a product or a demo.

With gusto, Fatih.