The AI chatbot ChatGPT was found to improvise and make errors similar to those of a student when faced with an ancient mathematical problem, according to a new study by education researchers. The experiment involved presenting ChatGPT-4 with the “doubling the square” problem, first described by Plato around 385 BCE. This challenge has long fueled debate about whether knowledge is innate or developed through experience.
Researchers Dr Nadav Marco, a visiting scholar at the University of Cambridge, and Professor Andreas Stylianides from Cambridge’s Faculty of Education, set out to see if ChatGPT would solve Plato’s problem using stored knowledge or develop solutions in real time. Marco is also affiliated with Hebrew University and David Yellin College of Education in Jerusalem.
Plato’s original lesson features Socrates guiding an uneducated boy toward understanding that doubling a square’s area requires constructing a new square whose sides match the diagonal of the original. In their study, Marco and Stylianides mimicked Socratic questioning before introducing deliberate mistakes and variations on the problem.
Although ChatGPT is trained on large amounts of text data and not diagrams—making it less adept at geometric reasoning—the researchers expected it to reproduce Socrates’ classic solution due to the prominence of Plato’s work. Instead, ChatGPT initially took an algebraic approach unfamiliar to Ancient Greek mathematics. It resisted attempts to follow the mistaken logic typical for learners in Plato’s account and persisted with algebra even after being told its answer was only approximate. Only after direct prompting did it provide a geometrical solution.
“Instead, it seemed to take its own approach,” said Stylianides. “If it had only been recalling from memory, it would almost certainly have referenced the classical solution of building a new square on the original square’s diagonal straight away.”
When given further variants—such as doubling a rectangle while keeping proportions—ChatGPT again relied on algebra and incorrectly claimed there was no geometric solution involving diagonals for rectangles, despite alternatives existing. Marco noted that this mistake likely resulted from improvisation rather than retrieval from its training data.
Marco described ChatGPT's behavior as "learner-like," saying: “When we face a new problem, our instinct is often to try things out based on our past experience,” he said. “In our experiment, ChatGPT seemed to do something similar. Like a learner or scholar, it appeared to come up with its own hypotheses and solutions.”
Further tests included asking ChatGPT how to double the area of a triangle; again it defaulted to algebra before eventually producing an accurate geometric answer after additional guidance.
The authors caution against over-interpreting these findings since they could not directly observe how ChatGPT processes information internally. However, they note that users’ digital interactions suggest both retrieval of learned material and adaptive reasoning are present.
They compare this process with the educational concept known as the “zone of proximal development” (ZPD), which refers to what learners can achieve independently versus what they can accomplish with support.
Stylianides emphasized: “Unlike proofs found in reputable textbooks, students cannot assume that Chat GPT’s proofs are valid. Understanding and evaluating AI-generated proofs are emerging as key skills that need to be embedded in the mathematics curriculum.”
Marco added: “These are core skills we want students to master, but it means using prompts like, ‘I want us to explore this problem together,’ not, ‘Tell me the answer.’”
The research appears in the International Journal of Mathematical Education in Science and Technology.