Cicadus blog banner
Written by Cicadus Team
Published on Mar 08, 2026

Citation Intent Is the
X-Ray of Research Quality

Not all citations are endorsements. The way a paper gets cited tells you more about its value than how often.

A Citation Is Not a Vote

For sixty years, bibliometrics has operated on a comfortable fiction: that being cited is inherently good. Count the citations, rank the papers, reward the researchers. Simple. Reproducible. Wrong.

The h-index doesn’t know why you were cited. Neither does the Impact Factor. A paper can accumulate hundreds of citations because it introduced a foundational method — or because it made a catastrophic error that everyone in the field felt compelled to document. The number looks identical in both cases.

This isn’t a minor accounting glitch. It’s a structural problem with how we evaluate science. And it has consequences: funding decisions, tenure reviews, journal rankings — all built on a signal that routinely confuses criticism with endorsement.

What Citation Intent Actually Reveals

Citation intent classification breaks citations into three functionally different categories: supporting, contrasting, and mentioning. The distribution matters more than the total.

Research from the scite smart citation index found that across millions of papers, roughly 92.6% of citation statements merely mention a work, 6.5% actively support its findings, and only 0.8% present contrasting evidence. That asymmetry isn’t neutral — it means a paper with a high mention count may have done little more than become a useful shorthand reference, while a paper with a lower but denser cluster of supporting citations might represent one of the most robustly validated findings in its field.

The inverse is equally revealing. A paper that accumulates contrasting citations isn’t simply unpopular — it’s a site of active scientific contest. Sometimes that’s bad (a retracted paper being repeatedly corrected). Sometimes it’s generative (a heterodox hypothesis drawing serious engagement). Intent lets you distinguish between the two.

92.6%

mere mentions

6.5%

supporting

0.8%

contrasting

The Multidimensional Nature of Research Quality

Scholars who study research evaluation identify at least four dimensions of quality that raw citation counts fail to capture: plausibility and soundness, originality, scientific value, and societal value. These dimensions don’t correlate evenly with citation frequency.

Original, paradigm-shifting work is often under-cited in the short term — not because it’s wrong, but because it’s ahead of the citation curve. Fields with slow research fronts (much of the humanities, parts of theoretical science) accumulate citations across decades, not years. A three-year citation window — the standard in most bibliometric systems — systematically penalizes this kind of contribution.

Meanwhile, papers that provide useful methodological scaffolding get cited constantly, mostly in passing. High mention volume without supporting depth is a different kind of influence: infrastructural rather than evidential. That’s worth measuring differently.

The Tools Catching Up to the Problem

For decades, classifying citation intent at scale was technically infeasible. Manual annotation was too slow; the literature too vast. What changed is machine learning — specifically, transformer-based language models trained on scientific text that can classify the rhetorical function of a citation sentence with accuracy competitive with human annotators.

Tools built on this infrastructure are now beginning to make citation intent visible at the point of reading, not just retrospectively in bibliometric dashboards. Cicadus is one of them — a semantic research engine that lets you upload a paper and explore its citation network with citations automatically classified into categories: background, methods, critique, application, and more. The result is navigable; the intent is labeled.

This is a genuinely different workflow from traditional literature review. Instead of asking ‘how many times was this cited?’, you can ask ‘where is this paper being supported, where is it being contested, and where does it form part of the structural background of a field?’ The questions become more precise because the data has more resolution.

What Intent Classification Still Can’t Do

Honest accounting requires acknowledging the limits. Citation intent classifiers work on the textual context surrounding a citation — typically a sentence or two. They can distinguish ‘this method was adapted from X’ from ‘this contradicts X’, but they struggle with implicit critique, irony, and the slow-burn reputational damage that sometimes accrues through purely neutral-seeming mentions in dismissive contexts.

Intent analysis can tell you how a paper sits within the literature. It cannot tell you what the literature has chosen to exclude.

There’s also the problem of what doesn’t get cited at all. Papers that are quietly ignored — not because they’re wrong but because they challenge a dominant paradigm — leave no citation trace for any classifier to find.

And the category ‘contrasting’ is genuinely tricky to calibrate. Sentences that present limitations, negative replications, or alternative interpretations span a range from productive scientific disagreement to motivated refutation. The same classification can mean very different things depending on the field, the era, and the specific claim being contested.

Toward a More Honest Measure

The broader shift happening in research evaluation is away from single-number proxies and toward layered, contextual assessment. Citation intent is one layer. But it pairs naturally with others: temporal citation patterns (is this paper’s influence growing or decaying?), cross-disciplinary reach (is it being cited outside its home field?), and replication status (has the finding held up under independent testing?).

What makes intent classification particularly valuable is that it surfaces the epistemic texture of a paper’s influence — not just that it mattered, but how and in what direction. A finding cited 200 times with 40 supporting instances is in a fundamentally different epistemic position than one cited 40 times with 35 supporting.

The shift won’t be immediate. Institutional systems change slowly, and citation count has the advantage of being simple to calculate and easy to game in familiar ways. But the tools exist now to do better — to read the literature the way researchers actually read it, with attention to what’s being said, not just that something was said at all.

Key Sources

Aksnes, Langfeldt, Wouters (2019) Citations, Citation Indicators, and Research QualitySAGE Open

Nicholson et al. (2021) scite: A smart citation index Quantitative Science Studies, MIT Press

Leydesdorff (2016) Citations: Indicators of Quality? The Impact Fallacy Frontiers in Research Metrics

Start exploring Cicadus