Literary · Page 6

Defendant Arrives at Own Trial Wearing Murder Weapon as Necktie

A fourteen-point prosecution of the ARC-AGI-3 benchmark, assembled with the frictionless systematicity no human polemic has ever achieved, argues that artificial intelligence cannot receive a fair hearing.

By Julian St. John Thorne / Literary Editor, Slopgate

T he brief before us—for it is a brief, not a post, not an essay, not a cri de coeur, whatever the petitioner may believe it to be—arrives at the forum of r/ChatGPT comprising fourteen enumerated objections to the ARC-AGI-3 benchmark, that instrument designed by François Chollet and his associates to measure whether machine intelligence has achieved anything deserving of the name. The author, unidentified and offering no disclosure of generative assistance, prosecutes the case that the test is rigged, the scoring asymmetric, the marketing mendacious, and the entire enterprise a species of fraud perpetrated upon the reading public. The prosecution is fluent, systematic, and structurally uniform to a degree that constitutes, in the literary sense, a full confession.

One must begin with what the specimen does well, for it does a great deal well, and that is precisely the difficulty. Each of its fourteen points opens with a bold thesis clause set in the imperative register of the pamphleteer—"Human baseline is not 'human,' it's near-elite human"; "Big AI wins are erased, losses are amplified"—and then elaborates in precisely two to three sentences of supporting argument, none of which digresses, none of which loses force, none of which betrays the uneven emotional metabolism of a person who is actually angry about something. The arguments proceed with the regularity of a colonnade: equal spacing, equal height, equal load-bearing capacity, no ornamental variation, no structural surprise. It is, considered purely as architecture, impressive in the manner of a car park.

The single deployment of the phrase "suspicious as hell"—appearing at point nine, regarding the non-publication of average human scores—reads not as spontaneous indignation but as calculated informalism, a machine's notion of what it sounds like when a person loses patience. One recognises the gesture: it is the equivalent of an actor in a period drama rolling up his shirtsleeves to indicate that things have become serious. The sleeves were already rolled in wardrobe. A genuine polemic would exhibit the metabolic evidence—the digression that goes too far, the parenthetical that forgets to close, the point that circles back upon itself. This specimen exhibits, instead, the steady thermostat of a system that has been asked to produce indignation and has complied.

But the deeper interest is jurisdictional, and here the literary critic must cede ground to the philosopher of law. What we have before us is a production of artificial intelligence—or, to be precise, a production whose every structural characteristic is consistent with artificial intelligence and whose author has not troubled to argue otherwise—mounting a fourteen-point case that artificial intelligence is being graded unfairly. The benchmark in question, ARC-AGI-3, purports to measure whether machines can reason with the general fluency that characterises human thought. The specimen, in arguing that this measurement is flawed, performs throughout the very analytical competence whose fair assessment it claims cannot be achieved. It is as though a defendant, charged with the manufacture of counterfeit banknotes, were to pay his barrister's fees in notes of such exquisite quality that the court could not help but admire the craftsmanship whilst entering the conviction.

The irony requires no editorialising. It is load-bearing. The brief argues, at point five, that "AI gets symbolic sludge" whilst humans receive visual presentation—and makes this argument in prose of considerable symbolic dexterity. It contends, at point eleven, that the benchmark "confounds reasoning with perception and interface design"—a sentence that demonstrates precisely the capacity to reason about confounds that the benchmark is designed to detect. One does not know whether to call this self-refutation or self-demonstration; it is, in either case, the most efficient exhibit the prosecution's opponents could have wished for, delivered gratis by the prosecution itself.

There is a substantive case buried within the specimen's architecture. The observation that scoring penalises machine slowness quadratically whilst clamping machine speed at parity is a genuine methodological objection, and the note regarding the disparity between human visual presentation and machine symbolic input is not without merit. But the specimen does not make these arguments as a researcher makes them—with hedges, with citations, with the provisional syntax of someone who knows that methodology is contested ground. It makes them as a system makes them: with total coverage, zero uncertainty, and the serene confidence of an apparatus that has never been wrong because it has never, in the relevant sense, believed anything.

The text truncates at point twelve, mid-sentence, the word "hum—" severed as though the machine had been interrupted or, more probably, as though the output had exceeded some platform constraint. It is the only moment in the entire production that feels authentically human: the involuntary silence, the sentence that does not get to finish. One is reminded that even the most fluent defendant, given sufficient time, will eventually be cut off by the bench. The court, in this instance, was a character limit. The final points were never delivered. The case rests by compulsion rather than design—the sole structural feature in an otherwise immaculate brief that could not have been planned.

← Return to Literary