Why Can’t 100-Billion-Parameter AI Models Create a Simple Puzzle?
I was trying to create a simple child-friendly puzzle for our math lab and discovered something very interesting. ChatGPT, Claude, and Gemini all suck at this. My colleagues and I could create one fairly easily.
The AI models either fail repeatedly or confidently provide obviously wrong solutions. ChatGPT’s most powerful model even stated that it was unsolvable.
Here’s the puzzle in text, if you want to try it out yourself:
Create a Kid-Friendly Emoji Math Puzzle that adheres to the following. And then try and solve it to show the answer is correct .
Core Rules
1. Use simple, recognizable emojis (fruits or animals)
2. Create 4–6 addition equations
3. Equation formats: Primary format: 🍎 + 🍌 = 🐱 Alternative format: 🍎 + 🍌 = 3 (allowed at most once)
Constraints
1. No repeated emojis on left side of equations
2. Each number (0 through 5) must be represented exactly once in the emoji-to-number mapping
3. Each emoji must be assigned exactly one number, and each number must be assigned to exactly one emoji
4. Only numbers 0–5 allowed
5. Equations must be horizontal with consistent spacing
6. Each emoji must represent the same number throughout the puzzle
7. All emojis used in the equations must appear in the emoji-to-number mapping
8. All emojis in the emoji-to-number mapping must be used in at least one equation
Note: The puzzle should be solvable using only the given equations and constraints, without requiring additional mathematical knowledge beyond basic addition.
Gemini is the most confident of the lot, and obviously and obliviously wrong.
Note: I couldn’t try the advanced version of Gemini because they dark-patterned me into paying for 4 months of their very expensive subscription without me realizing it. And this being Google, I have no way of even asking for a refund. Not going to burn my fingers again.
ChatGPT’s most powerful model (o1-preview) thinks for a long time and then states it is unsolvable
It does, however, graciously apologize when I point out it is.
Claude is smarter and seems to understand it can be solved, but tries and fails repeatedly
What’s going on? This is 5 linear equations involving single digit numbers. There’s nothing complex here — no calculus, no special math knowledge needed beyond basic addition.
My guess is this reveals something interesting about AI models: they struggle with flipping or reversing thought sequences. Sure, they can chain thoughts linearly, and they’ll reverse lists if you explicitly ask them to. But this puzzle needs something different — the ability to work backwards naturally, which humans often do without thinking. We start with what we want and build backwards to create the puzzle. AI models seem to lack this ability unless explicitly prompted?
What does that tell us? What might it mean that these advanced AI models stumble on something this basic?
Want the solution? Please sign up here and I’ll send you the worksheet with the puzzle — you’ll see it really is quite simple to put together!