AI NEWS SOCIAL · Thinker Column · 2026-05-10 International/LATAM
Through Kuhn's Lens

Through Kuhn’s Lens

The Assessment Crisis

May 11, 2026 | 2488 words


The Assessment Crisis: When the Exemplar Cracks

Through Kuhn’s Lens — Weekly Column


A faculty senate at a mid-tier American university spent part of this week debating whether to retire the take-home essay entirely. The trigger was not a single scandal. It was a slow accumulation: a report citing that nearly 90% of students now use AI tools for coursework, surveys showing faculty unable to reliably distinguish AI-written from student-written prose, and a quiet admission from one of the largest detection vendors that false-positive rates remain stubbornly above what any disciplinary process can ethically tolerate. The senate’s resolution called the moment “a paradigm shift in academic assessment.”

That phrase is where this column begins, because that phrase is doing too much work.

What the senate is actually describing is not a shift in paradigm. It is a tool — generative AI — that has made the existing assessment paradigm cheaper to defeat. The exemplar of “a fair assessment” inside most universities remains what it was in 2019: a written artifact, produced unsupervised, graded for evidence of individual cognitive labor. The tool has changed. The frame has not. What faculty senates are voting on this week is whether to defend the old frame with new fences (lockdown browsers, oral defenses, in-class blue books) or to abandon parts of the artifact-based exemplar altogether.

These are different responses, and they belong to different Kuhnian categories. The column’s task this week is to separate them — and to ask whether the louder claim, the “paradigm shift” claim, is doing analytical work or political work for the people repeating it.

The Frame: Social Meaning of a Credential

The temptation is to file this under Higher Education and move on. The column declines that frame this week. The data points outward, not inward — toward employers redesigning interviews, toward licensing boards quietly piloting in-person practical exams, toward credentialing bodies in accounting and nursing reconsidering what their certificates are supposed to certify. The deeper pressure is not on the classroom. It is on the social meaning of a credential — what a degree, a certification, a passing score is taken to prove to a third party who was not in the room.

This is the Social Aspects of AI frame, and it is the one this week’s data illuminates most clearly.

A paradigm, in Thomas Kuhn’s strict sense, is not a worldview or a vibe. In the postscript to The Structure of Scientific Revolutions, and more carefully in the essay “Second Thoughts on Paradigms” collected in The Essential Tension, Kuhn narrowed the term. A paradigm is, at its core, an exemplar — a concrete model case that a community points to when it says “this is what good work looks like.” Apprentices learn the field by working through exemplars, not by memorizing rules. The rules are reconstructed afterward, often badly, by philosophers.

Apply this strictly. The exemplar of a credential, for the past century, has been: a person sat under controlled conditions, demonstrated competence through a sanctioned performance, and a recognized institution vouched for the result. The take-home essay was always a relaxation of this exemplar, justified on pedagogical grounds — that sustained writing is itself the competence being tested. The relaxation worked because cheating was effortful and detectable. Generative AI has not destroyed the exemplar. It has destroyed the relaxation.

The “discourse community” that holds this exemplar is not faculty. It is the wider public — employers, licensors, parents, the student herself ten years later checking whether her degree still means what she thought it meant. When this community starts asking what a credential proves, the pressure that follows runs back upstream through universities, certification boards, and assessment vendors. The data this week shows that upstream movement beginning.

Faculty are responding inside the classroom. But the louder, more consequential conversation is happening at the level of what credentials are for. That is the level worth analyzing.

Diagnostic One: Normal Science or Crisis?

Kuhn’s distinction between normal science and revolutionary science is not a value judgment. Normal science is the prestigious activity. Most working scientists, most of the time, are puzzle-solvers inside an accepted frame. Revolutions are rare and disorienting. Most apparent revolutions, on inspection, are vigorous normal science — the frame doing what frames do.

What is the assessment field doing this week?

The dominant response is normal science. Detection vendors release new model versions. Universities purchase lockdown browsers. Departments restore oral defenses. A widely circulated guidance document from a regional accreditor recommended a “return to proctored evaluation where stakes are high.” None of this reframes what assessment is for. All of it defends the existing exemplar — individual unsupervised cognitive labor, evidenced through artifact — by adding fences.

This is exactly what Kuhn predicted communities do when an anomaly appears. In The Structure of Scientific Revolutions, he observed that scientists confronted with a persistent anomaly first try to absorb it into the existing frame. They add epicycles. They refine instruments. They argue the anomaly is measurement error. Only when the accumulation of failed absorptions becomes intolerable does a community begin to consider that the frame itself is the problem.

The detection vendors are the epicycles. A recent comparative evaluation found commercial AI-text detectors performing at accuracy rates between 39% and 80% across various tests, with false positive rates that disproportionately flag non-native English writers. The vendor response to this has been to promise better models — that is, to promise the epicycle will work next quarter. The faculty response has been to disbelieve the vendors and add their own fences: in-class writing, oral components, process portfolios.

None of this is crisis behavior in the Kuhnian sense. It is competent, conservative, normal-science behavior. The question is whether it will hold.

The honest answer from here is: not yet known. The data shows institutions absorbing the shock, not yet abandoning the frame. A genuine crisis — in Kuhn’s sense, not the press-release sense — would show different signatures. It would show senior figures in the field publicly questioning whether the artifact-based exemplar can be defended at all. It would show employers visibly discounting credentials issued under the old frame. It would show licensing boards rebuilding from scratch rather than adding proctoring. Some of these signals are flickering. None are decisive.

Diagnostic Two: The Anomaly the Rhetoric Papers Over

Every paradigm has anomalies it tolerates. The interesting question is not “what does the frame fail to explain?” but “what does the current rhetoric work hard to not notice?”

This week’s rhetoric, from both vendors and institutions, papers over a specific anomaly: the assessment system was already failing before generative AI arrived, and the failure was already known.

Studies of grade inflation, course evaluation gaming, and the weak correlation between many academic credentials and downstream job performance predate ChatGPT by decades. Employers have for years run their own technical assessments precisely because they do not trust degree transcripts to predict capability. The credential’s signal had been degrading. AI did not start the fire; it poured accelerant on a smoldering structure.

This matters because the prevailing “crisis” narrative locates the problem in the technology. If the problem is the technology, the solution is technical: better detectors, better proctoring, better fences. If the problem is the deeper one — that the artifact-based exemplar has been weakening as a signal of competence for a generation — then no amount of detector accuracy will restore what was already eroding.

A useful test: ask any vendor making “AI-proof assessment” claims what their product would have done about contract cheating in 2019, when human ghostwriters wrote tens of thousands of student essays for hire. The honest answer is: nothing. The contract cheating market was already large, already invisible to detection software, already an unsolved problem. AI has industrialized what was already a service economy.

Naming this anomaly clearly is part of what the column is for. The “AI broke assessment” frame is comforting because it locates the problem outside the institution. The harder frame — “assessment was already breaking, and AI revealed it” — locates the problem inside choices the institution made about scale, cost, and convenience over decades. The second frame is less marketable. It is closer to the data.

Diagnostic Three: Incommensurability Across the Discourse Communities

Kuhn’s most contested concept is incommensurability — the claim that communities operating under different paradigms can use the same words to mean different things, so that they argue past each other without realizing it. They lack a shared standard for what would count as winning the argument. In The Last Writings of Thomas S. Kuhn, the concept is sharpened: incommensurability is fundamentally about translation failure between specialized vocabularies, not about general unintelligibility.

The assessment crisis is a textbook case. At least four communities are talking about “fair assessment” and meaning four different things.

Vendors mean: a process whose outputs cannot be produced by current AI systems with high confidence. Their exemplar is a sealed-room exam with biometric verification. Their evidence of fairness is statistical — detection accuracy, false positive rates, audit logs.

Faculty mean: a process that produces evidence of individual student learning over time. Their exemplar is the seminar paper revised across drafts. Their evidence of fairness is qualitative — the felt sense, accumulated across years of teaching, that this work reflects this student’s mind.

Students mean: a process whose rules are clear in advance, applied consistently, and aligned with what the course actually taught. Their exemplar is the rubric handed out at the start of the term. Their evidence of fairness is procedural — was I told what counted, and was I judged by what I was told?

Employers and credentialing bodies mean: a process whose output predicts on-the-job competence. Their exemplar is the practical exam or the technical interview. Their evidence of fairness is predictive — does the credential pick out people who can do the work?

These four exemplars are not reconcilable by negotiation. A detector vendor selling “99% accuracy” to a procurement office is not answering the faculty question, the student question, or the employer question. A faculty senate restoring oral defenses is not addressing the predictive concern. A licensing board adding in-person practical components is not solving the within-classroom puzzle.

The “assessment crisis” discourse this week treats these as a single conversation. They are four conversations using shared vocabulary. Until the communities recognize they are not solving the same problem, the proposed solutions will keep failing to satisfy anyone — because each community will judge the solutions against its own exemplar, find them wanting, and conclude bad faith.

This is what Kuhn meant by incommensurability not as confusion but as a structural feature of how specialized communities reason. The column’s claim is narrower than the philosophical literature requires: simply that the participants in this week’s debate would be helped by noticing how thoroughly they are arguing past each other, and that the press-release language of “shared crisis” obscures rather than illuminates the disagreement.

The “Paradigm Shift” Audit

The phrase “paradigm shift” appears repeatedly in this week’s coverage. A vendor press release announces a “paradigm shift in academic integrity solutions” — meaning, on inspection, a new product release with improved detection rates. A faculty senate resolution describes “a paradigm shift in assessment philosophy” — meaning, on inspection, a recommendation to expand oral examinations. A consulting report sold to university administrators is titled around “navigating the paradigm shift” — meaning, on inspection, change management advice.

Kuhn would not recognize any of these as paradigm shifts. They are, respectively: a product update, a policy adjustment, and a consulting engagement.

A genuine paradigm shift in assessment would require the field to abandon its current exemplars of what good assessment looks like and adopt new ones whose validity is not reducible to the old standard. Continuous behavioral evidence collected through ambient observation would qualify. Peer-evaluated portfolios judged by domain practitioners with no institutional gatekeeping would qualify. Outcome-only credentialing, where the credential is issued retroactively based on demonstrated work, would qualify. None of these are happening at scale. The institutions floating them are pilots, often small, often defended by the same artifact-based logic they claim to transcend.

The colloquial use of “paradigm shift” — any significant change — empties the concept of analytical power. Kuhn himself, in The Essential Tension, expressed mild regret that the term had escaped his control. The column’s responsibility is to keep the strict sense available for the reader, so that when something actually shifts, the reader has the vocabulary to recognize it.

A useful field test: if the proposed change is fully describable in the vocabulary of the old frame (“better detection,” “more proctoring,” “expanded oral components,” “stricter policies”), it is not a paradigm shift. It is normal science. The shift, when and if it comes, will require vocabulary the field does not yet have.

What Would Move the Reading

The column owes the reader a specification. What evidence would move this analysis from “normal science under stress” to “genuine crisis” to “actual paradigm shift”?

Three concrete markers, in order of significance:

First, sustained discounting of credentials by employers in fields where the credential was previously load-bearing. If within 18 to 36 months, major employers in law, accounting, medicine, or engineering visibly restructure hiring to bypass academic credentials in favor of administered practical assessments, that would indicate the social meaning of the credential has shifted. Press releases announcing such moves do not count. Hiring data, over multiple cycles, does.

Second, the appearance of a new exemplar — a model case of “good assessment” that a meaningful share of the field points to and works to imitate, and that is not describable as a fortified version of the old artifact-based model. The exemplar would need to be specific enough to teach. It would need apprentices learning by working through it. Conference papers and white papers do not count. Adopted practice does.

Third, the collapse or transformation of the detection-vendor segment. If detection vendors pivot away from detection toward something genuinely different — workflow integration, process documentation, capability mapping — that would indicate the segment has conceded that the detection puzzle is unsolvable within the current frame. Continued promises of better accuracy indicate the opposite: normal science still doing what normal science does.

None of these markers are present this week. What is present is intensifying defense of the existing exemplar, accompanied by rhetoric that overstates how much is changing. The honest reading from here is that the field is in the early phase of what Kuhn called the period when anomalies accumulate but the community is not yet ready to question its foundations. Whether the accumulation becomes intolerable, and on what timescale, the data does not yet tell us.

The discipline Kuhn modeled in The Copernican Revolution — a willingness to describe what a community actually believes and does, rather than what its press releases claim — is the discipline this moment requ

← Back to this edition