Spotting fakes by looking at them

brown human eye
pexels-photo-2873058

On March 10, the Supreme Court said a balance has to be struck between warding against misinformation online and protecting citizens’ right to free speech. The context was the Centre’s attempts to defend the 2023 IT Rules: when the comedian Kunal Kamra asked who would decide if online content is “fake or misleading”, the Centre said, “When we see it, we know it is fake”.

The case has been led by Kamra, the Editors Guild of India, and other petitioners and in its course the Bombay High Court and the Supreme Court have been asked to weigh the constitutionality of a "Fact-Check Unit" (FCU) mandated by the national government. The petitioners have argued that giving the government the power to flag “fake, false or misleading” information will have a chilling effect on free speech and that the provisions turn the state into a judge in its own cause. The Centre’s defence — “known it when we see it” — is, as both history and data science show, a recipe for disaster.

Solicitor general Tushar Mehta, who offered the defence on the Centre’s behalf, is likely to know that the line echoes a chaotic chapter in American legal history. In the 1964 case Jacobellis v. Ohio, US Supreme Court Justice Potter Stewart had to define obscenity. But frustrated by the lack of a precise legal definition, he famously wrote in his concurrence: “I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description; and perhaps I could never succeed in intelligibly doing so. But I know it when I see it.”

The legacy of this little statement was a big mess. If a Supreme Court Justice couldn’t articulate a clear standard, it was clearly folly to expect local police and juries to do so — more so since what passed for art in Manhattan could lead to a prison sentence in rural Georgia. The immediate result was different federal circuit courts applying different tests, creating a patchwork of legal outcomes across the country. Inevitably, the situation descended into the absurd: throughout the late 1960s, US Supreme Court Justices regularly screened films to determine if they were “obscene” in a projection room at the Court, literally deciding the law based on their own instincts and physiological reactions.

By 1973, the Court realised this was unworkable. In Miller v. California, it established a three-part framework known since as the Miller test. It asked whether the average person, applying “contemporary community standards”, would find the work to be prurient; whether it depicts sexual conduct in a “patently offensive” way; and whether it lacks serious literary, artistic, political, or scientific value (a.k.a. the SLAPS test).

But since even this framework lacked a universal standard, the potential for harm persisted. For instance, because “community standards” were local, federal prosecutors in the 1980s and 1990s began a practice called jurisdiction shopping: they would carefully prosecute distributors in the most conservative parts of the country for material that was actually sold nationwide. The practice then forced businesses to calibrate their content to the most restrictive local market in the country in order to avoid jail time — a sort of regression to the most conservative position.

The “know it when I see it” heuristic ultimately became meaningless with the coming of the internet, which allowed content producers to be located in California even as their content is served in Alabama, thus confusing the notion of ‘community’ and the resulting community standards. Federal prosecutors were eventually forced to abandon most obscenity cases altogether and shift their focus to child exploitation, which is prohibited regardless of location or community.

India’s proposed FCU threatens to play through this same history of failures. And it will begin as a patchwork of censorship that will depend on who’s looking at a screen when a certain clip is playing.

But the fact is nobody has to know it just by seeing it. Data science and international regulations today offer testable ways to identify misinformation.

One option is automated fact-checking that uses large databases of verified information. Instead of an official simply declaring a claim false, a system can check whether the statement connects to any documented policy decisions or records. If a viral post claims that “the government has banned P”, the system can scan policy documents, gazette notifications, and other reliable databases to check whether such a decision appears anywhere. If no record exists, the claim can be flagged and labelled as unsupported. The machine need not be all that intelligent as the bigger point here is to ensure the verdict can be traced to evidence available in the public domain.

For instance, if a viral post claims "the government has banned P”, the algorithm will calculate the shortest ‘logical path’ between the nodes for “government”, “banned”, and “P” across all known policy documents. If no logical path exists, the system flags the information with a low truth-value score. This provides a quantifiable metric, moves the conversation from “I think this is fake” to “the data shows no factual connection for this claim”, and could even spare the people staffing censorship teams at social media companies considerable psychological harm. The government making the algorithm open-source — as it should be considering it will be in service of the public — will also add another layer of integrity.

Another option is to look at the European Union’s Digital Services Act, which — instead of deciding whether every individual post is true or false — has regulators ask whether a stream of information poses a systemic risk to public health, security or democratic debate. Platforms are then required to monitor patterns like how quickly a claim spreads and whether coordinated networks of accounts (e.g. bots) are pushing the same message. So the focus here is not on the content of a single post but to examine the behaviour of the information as it moves through the network itself.

The Centre’s current argument, however, ignores these tools and doubles down on a standard that has failed every time it has been applied, chiefly because it creates a legal landscape in which no one knows the rules until they have already broken them. Unless of course this is the Centre’s aim.