Trading Polish for Proof in the Post-Essay University
We grade the pottery, not the dig site. We grade the artifact, not the excavation. We grade the fossil, not the stratified layer that proves when it lived. This fundamental error has left us defenseless in an age where any artifact can be conjured from nothing, where perfect pottery appears without human hands ever touching clay.
The revolution isn't that AI can write. The revolution is that writing no longer proves thinking happened. The essay was fossilised thought and we could carbon date its authenticity by its very existence.
That world is gone. A student can generate a first-class essay in twelve seconds. A literature review that would have taken weeks emerges fully formed in under a minute. The traditional term paper, that monument to sustained intellectual effort, can be summoned like ordering coffee through an app. We are archaeologists whose entire field has just discovered that fossils can be 3D printed.
In a world of GenAI every essay, every report, every reflection, every analysis that can be submitted asynchronously is epistemologically dead. It proves nothing about student learning. It demonstrates nothing about understanding. It signals nothing about capability.
The Archaeological Turn
But archaeology offers us a way forward. Real archaeologists don't care if a pot is beautiful. They care if it's real. They care where it was found. They care about the soil layer, the associated finds, the carbon deposits, the trace elements that prove when and where it existed. They care about provenance and provenience, the chain of custody and the site of discovery.
What if we stopped grading the pot and started grading the dig?
This isn't metaphorical. It's methodological. Every piece of student work needs what archaeologists call "context": the irreducible specifics of where it came from. Not the smooth surface of the final essay, but the rough edges of its emergence. Not the polished argument, but the documented moments of contact with the world that made the argument possible.
A student submits a business report. We don't grade the report. We grade the screenshot of the specific consultation page they found, timestamped and archived. We grade their two-line explanation of why this source and not another. We grade their evidence note that says which recommendation they killed and why. We grade their ability to reproduce one calculation when the parameters shift. The report is just packaging. The evidence of thought is in the metadata.
Reality Tokens and the Economy of Presence
In archaeology, the smallest shard can be the most valuable if it's found in the right layer. In assessment, we need our own version of the diagnostic shard, the tiny fragment that proves human presence.
Call them reality tokens: irreducible moments of contact between a mind and the world. A photograph of a local planning notice with a sentence about what it reveals. A sensor reading from an actual measurement, not a simulated one. A three-minute screen recording of adjusting a model when one parameter changes. A timestamped observation from a specific location that no model could predict because it hadn't happened yet when the model was trained or searches the web.
These aren't comprehensive. They're not meant to be. They're spot checks on reality, small enough to be unfakeable, specific enough to be unrepeatable, simple enough to mark at scale. They're what proves the student was present in their own learning.
The radical move is making these tokens the primary site of assessment. The traditional essay becomes secondary, almost decorative. What matters is the trail of evidence that the student actually encountered something, thought about it, made a decision, and can defend that decision when conditions shift.
The Forensics of Learning
We need to think like forensic investigators, not literary critics. Every submission needs chain of custody documentation. Not surveillance, but testimony. A standard evidence note, two hundred words maximum, that travels with every piece of work. It states what question drove the investigation, which sources changed the student's thinking and why they were trusted, how AI outputs were used and then altered, what decision was made and what option was cut.
This is not about catching cheaters. It's about changing what counts as achievement. The student who can produce the most elegant paragraph with AI assistance gets the same grade as the student who produces a workmanlike paragraph without it, if neither can demonstrate the forensic trail of actual thought. But the student who can show they noticed something specific, verified it through a reproducible method, made a decision based on actual constraints, and can adapt when those constraints shift? That student excels, regardless of how polished their prose.
Education is about to have a reproduction revolution. Not reproducing entire papers, but reproducing tiny moments of competence. A five-minute window in the learning management system where students adjust their method to a small variant. A quick recalculation when the price schedule changes. A brief explanation of how their approach would shift if one policy clause was different.
This isn't testing. It's demonstration. It's the difference between claiming you can cook and actually adjusting a recipe when you're out of butter. It's the difference between saying you understand and showing you can adapt. It's what competence actually looks like in the world: not perfect initial performance, but responsive adjustment to changing conditions.
The technology already exists. Quiz tools can time windows. Notebooks can run variants. Rubrics can check specific claims. We don't need new platforms. We need new priorities. We need to value the adjustment more than the answer, the adaptation more than the artifact.
Under this framework, excellence looks nothing like what universities currently reward. The best student isn't the one who produces the most sophisticated argument. It's the one who leaves the clearest evidence trail. The one who notices the stubborn fact that doesn't fit the theory. The one who documents their decision to exclude a data source and can defend it. The one who catches the clause that would void their recommendation and proposes the smallest legal change that would save it.
This is what the professions actually want. Not people who can produce impressive documents, but people who can be trusted to notice what matters, verify what's uncertain, price what's costly, and adapt when reality shifts. The skills we're assessing are exactly the skills that remain human when everything else can be automated: presence, judgment, contact, adaptation.
The Post-Authenticity Academy
We've entered the post-authenticity age for text. The response isn't better detection. It's changing what we value. When photographs became infinitely manipulable, we didn't stop taking pictures. We started caring about metadata. When anyone could doctor an image, we started valuing the RAW file, the EXIF data, the proof of when and where and how the image was captured.
Text needs the same revolution. The essay as a standalone artifact is as obsolete as the undocumented photograph. What matters is not the surface but the substrate. Not the final form but the formation process. Not the polished stone but the geological layer where it was found.
This changes everything about how we teach. We're not training students to be better writers in competition with AI. We're training them to be reliable witnesses to their own thinking. We're not asking them to produce knowledge artifacts. We're asking them to document their contact with reality.
The Practical Revolution
Here's what makes this implementable tomorrow, not someday. Every piece requires only tools we already have. The evidence note is a text field in the submission form. The reality token is an attachment with two lines of methodology. The reproduction step is a timed quiz question or a notebook cell. The constraint checks are rubric lines that take under a minute to assess.
For a business course: students submit their recommendation with a screenshot of a real consultation page, identify the blocking policy clause, and then adjust one table in a three-minute window when a parameter changes.
For a data course: students submit their pipeline with a paragraph about goals and sources, and the autograder runs it on a tiny holdout dataset. The marker only reviews what fails.
For a humanities course: students submit their essay with one local observation, one archived source, and a brief note on how their argument would shift if one historical fact was different.
This scales. It's sustainable. It works in massive online courses and small seminars. It doesn't require surveillance technology or synchronous assessment. It just requires accepting that we're not in the business of grading writing anymore. We're in the business of verifying thought.
We can evolve. We can recognise that in a world of infinite synthetic text, the only thing that remains scarce is documented contact with reality. We can shift from assessing outputs to assessing presence. We can grade the evidence trail, not the essay. We can value the metadata more than the data.
The archaeology of thought isn't a metaphor. It's a methodology. It's what assessment looks like when we accept that fluency is free, but presence is priceless. It's how we preserve the possibility of meaningful evaluation in an age of infinite simulation.
The student who graduates under this new framework won't be the one who wrote the best essays. They'll be the one who left the best evidence. They'll be the one we can trust to notice what others miss, to verify what others assume, to adapt when others freeze. They'll be the one who was actually present in their own education.
That's not a compromise with the AI age. That's what excellence has always actually looked like. We just forgot, because for a brief century, we could pretend that polished writing was proof of thought. That century is over. The age of archaeological assessment has begun.
Welcome to the dig site. Bring your own brush.



Provocative and I appreciate radical solutions. I worry that some of this process-documentation might also be turned over to AI, however, and as a professor, dread the thought of grading all of this. But we need new assessment tools so I'm all for tossing out ideas.
What about learning writing itself? That matters a lot in K-12 and in writing-oriented majors.