Who Checks the Magic Square?

Identifying Systemic Bottlenecks to Science

Published

2026-05-01

This is my submission to the Astera Essay Competition on identifying systemic bottlenecks to science. I point out what I consider the single biggest obstacle for the social sciences: verifying quality. I offer three solutions and no optimism.

Detail of Dürer's Melencolia I showing the title banner, comet, geometric solid, putto, magic square, and seated figure.

Albrecht Dürer, Melencolia I (1514), detail. Click for full view.

Melencolia I depicts a winged personification of Melancholy surrounded by the tools of geometry, seated in paralysis. They can measure the world but not overcome it.

Let me start with two stylized facts: first, tools of artificial intelligence (AI) are here to stay. They already changed science. Second, science, and in particular social science, has, and always had, an incentive problem because research quality is increasingly cheap to imitate but expensive to verify.

A magical solution: the reviewer ex machina

Detail of the magic square from Dürer's Melencolia I

In Melencolia I, a four-by-four magic square sits high on the wall. Every row, column, and diagonal sums to 34. The bottom row encodes 1514, the year of the print. A computer can produce millions of these squares in milliseconds, but who will check if they are indeed magic squares?

This brings me to my first solution: the reviewer ex machina. “Obviously,” I hear you say, “the solution is an AI-assisted review process.”

Perhaps there is a real case for this. AI systems can already check mathematical proofs, flag inconsistencies, inspect code, and attempt autonomous replications. Recent work on agentic reproduction of social-science results1 is exactly the sort of advance that makes this tempting: let the machine read the paper and produce an evaluation. But this solves only the easiest part of verification. It can tell us whether the rows add to 34. It cannot yet tell us whether 34 was the right thing to measure, whether we care about 34, whether the identifying assumption is credible, or whether the answer was convenient or innovative. Crucially, judgment itself does not scale, because someone, a person, still has to spend time to verify the scientific quality.

A tried solution: show all your work

Detail of the hourglass and tools from Dürer's Melencolia I

Dürer filled the engraving with an epistemic apparatus in the form of tools. We see keys and mathematical instruments. There is an hourglass. Melancholy is holding, dejectedly, a pair of dividers. All of these tools, in my interpretation, suggest that knowledge and transparency will fix the problem.

This brings me to my second solution: more disclosure. We in finance and accounting love disclosure.

Disclosing information about how the research was produced is the open-science answer: public data, public code, preregistration, registered reports, data-availability statements, AI-use statements, and perhaps soon LLM-readable papers.2 I am in favor of all of this. However: In 2013-2016 I worked at the Open Data Institute3 in London, during what I now think of as the open-data bubble. McKinsey had estimated that open data could unlock more than $3 trillion in annual value.4 I used to think that better data standards was the work. The problem, in retrospect, was that the commercial incentives were never there to fully realize the vision, and I believe the incentives in science are also not there to fully move to an open and reproducible state of research production.

A social solution: revolution, maybe?

Detail of the putto from Dürer's Melencolia I

Dürer also added a second figure. The cherub, or putto, probably does not have one fixed symbolic meaning. In my view, it serves as a contrast: a small active maker beside a larger figure who is paralyzed by thought.

This brings me to my third solution: the social change. When it comes to academic success, it appears your social network reigns supreme.5 The question is whether any of that activity verifies quality.

The social answer is to make results pass through frictions beyond the standard peer review. (Or to revolutionize the academic reward system, but I’m not holding my breath.) Limits on submissions are already happening. Presentations and communication rise in importance. Young scholars are evaluated more on how they can defend their claims in front of people who know the topic or methods. But this is surely not enough because attention remains rationed and the marginal cost of producing one more paper approaches zero. Maybe the best we can do is the antithesis to techno-solutionism: find like-minded people and slowly chip away at the problem?

In conclusion

The systemic bottleneck in the social sciences is clear and, I would argue, known: verifying quality. My answer to what to test is probably the most scientific one: it depends, and yes, some version of all of the above. I expect little voluntary social change, and disclosure alone has already shown its limits. So if I had to bet on one practical experiment, I would test a fair, automated, and integrated (!) version of the reviewer ex machina. The machine, however, should not pretend to certify truth. It should produce a public record of what was checked, what failed, what was repaired, and what still requires human judgment.