ClusterFunkJuly 2026We read thousands of research papers each month so you don't have to

How ClusterFunk finds what matters

Every month we scan tens of thousands of peer-reviewed research papers and run them through a pipeline of clustering, statistical modeling, and AI synthesis. Here is exactly what happens.

1. Finding the signals

We start with peer-reviewed research papers — the primary record of what scientists and engineers are actually working on, before it appears in products, press releases, or funding announcements. Academic research leads commercial deployment by three to seven years on average, which makes it the best early-warning system we have found.

Collect papers across 27 technology domains

We search for papers across domains ranging from large language models to solid-state batteries to CRISPR gene editing. Each domain is defined by a precise set of search terms to minimize contamination from unrelated fields. On the first run we fetch up to five years of history per domain — up to 10,000 papers per year — to build a deep baseline. Every subsequent monthly run fetches only new papers since the prior update, keeping coverage current without re-fetching what we already have.

Convert papers to numerical fingerprints

Each paper is converted into a mathematical representation that captures its meaning. We use TF-IDF (a technique that weights words by how distinctive they are to this paper versus the broader field) followed by dimensionality reduction to compress thousands of word features into 100 core dimensions. Papers about similar ideas end up close together in this space.

Cluster by density — no predetermined buckets

We use HDBSCAN, a density-based clustering algorithm that finds natural groupings in the data without being told how many clusters to look for. Papers that are genuinely similar cluster together. Papers that do not fit anywhere are flagged as noise and excluded. The result is a set of coherent research threads — what we call signals.

Name each signal using AI

Each cluster of papers is sent to a large language model that reads the full abstracts and writes a forward-looking name describing what this research thread is building toward — not what the papers are called, but what they unlock. For example, a cluster of papers about battery electrode chemistry might be named "Transforms Ionic Transport for Longer-Lasting EV Batteries."

Why this is different from reading the news. News covers announcements. Research covers discovery. The signal that a technology is about to break through often appears in academic papers 24 to 36 months before it shows up in a product launch. ClusterFunk is built to catch that window.

2. Measuring where each signal sits on the adoption curve

Knowing that a research thread exists is useful. Knowing whether it is six months old or six years old — and whether it is accelerating or plateauing — is far more useful. We measure this using the Bass diffusion model.

Originally developed in 1969 to forecast consumer electronics adoption, the Bass model has since been validated on hundreds of technologies, markets, and behavioral patterns. It describes how any new idea spreads through a population: first through pioneers who adopt independently, then through the majority who adopt because others around them have.

The S-Curve — Five Stages of Technology Adoption

Innovators(2.5% of field)Pioneering researchers exploring the unknown

Early Adopters(13.5% of field)Teams building the first real applications

Early Majority(34% of field)Mainstream practitioners adopting proven methods

Late Majority(34% of field)Industry-wide adoption becomes standard practice

Saturation(16% of field)Fully mature, research focus shifts to optimization

We fit the Bass model to the publication volume curve of each research cluster — how many papers appeared each month across the full paper coverage window for that topic. The model returns two key parameters:

Innovation coefficient (p)

How fast pioneering researchers are independently discovering and publishing on this topic. A high p means the field is being driven by breakthroughs, not followership.

Imitation coefficient (q)

How fast the broader research community is piling in because others are doing it. A high q relative to p means the field is network-driven — momentum feeds momentum.

R² (model fit quality)

How well the Bass curve actually fits the data. R² above 0.85 means the model is a strong predictor. Below 0.5 means the trend is too early or too irregular for confident projection, which we display as a low-confidence flag on the adoption chart.

The model output tells us the current adoption percentage — where this research thread sits on the S-curve right now — and projects when peak research activity will occur. On the adoption chart, "Today" is always visible so you can see exactly how much of the curve is ahead versus behind.

3. Measuring velocity across all topics

Adoption stage tells you where a field is. Velocity tells you how fast it is moving. We measure velocity as a cross-topic percentile rank — so the number means something relative, not absolute.

For each topic, we compute a six-month growth ratio: papers published in the last six months divided by papers published in the prior six months. We then rank all 27 topics against each other. A topic at the 84th percentile is growing faster than 84% of all tracked topics right now. This drives the vertical axis on the radar chart.

Fast does not mean good. Slow does not mean bad. A topic in Saturation with low velocity is a mature, established field — worth knowing about but not urgently acting on. A topic in Innovators with high velocity is a field that is just beginning to accelerate — potentially the most important place to be paying attention right now.

4. Detecting convergences

Some of the most important moments in technology happen when two previously separate fields begin working on the same problems. Convergences are our signal that a boundary is dissolving — historically a leading indicator that a new category is about to emerge.

We detect convergences using two independent signals. First, we look at every paper across all 27 topics and find papers that appear in more than one topic's cluster set — shared papers mean researchers in both fields are tackling the same problems with the same tools. Second, we measure the semantic similarity between topics using TF-IDF text embeddings, so convergences can be detected even before paper overlap becomes large.

How Convergence Is Detected

Topic A
1,200 papers

papers

Topic B
800 papers

◈ Convergence score: 68%

65% paper overlap (log-normalized) + 35% semantic similarity — cross-category pairs only

The convergence score blends 65% paper overlap (log-normalized against the smaller topic to prevent large topics from overwhelming small ones) with 35% semantic similarity between cluster text. Critically, we only surface convergences across different research categories — two robotics topics converging is structurally expected and not interesting. The signal worth watching is when a materials science cluster starts merging with a drug delivery cluster, or when energy storage research starts mirroring semiconductor manufacturing. Only cross-category pairs scoring above 35% are surfaced.

Why convergences matter more than individual trends. Single trends improve what exists. Convergences create entirely new categories. When quantum computing researchers start publishing alongside drug discovery researchers, that is not an incremental improvement to either field — it is the early signal of quantum pharmaceuticals as a distinct discipline.

5. The PEST impact framework

Every research cluster is scored across four dimensions of real-world impact. This gives you a quick read on which forces a technology primarily disrupts — and where to look for consequences beyond the obvious.

Political

Regulatory change, policy risk, government competition

Economic

Market disruption, cost curves, new business models

Social

Workforce change, access, cultural and behavioral shifts

Technological

Platform enablers, infrastructure, capability dependencies

PEST scoring is derived from the content of the papers themselves. We analyze which words and concepts in each paper relate to political, economic, social, or technological themes and aggregate these scores across all papers in a cluster. The dominant dimension is the one that shows up most strongly across the research — not our editorial judgment.

6. The AI synthesis layer

The clustering and modeling steps tell you what is happening and when. The AI synthesis layer tells you what it means for someone who is not a scientist.

For the top research clusters in each topic, we send the full text of all papers in the cluster — not just a sample — to a large language model with explicit instructions to write for an intelligent business reader, not a research audience. The model is required to synthesize the full body of research into a narrative that explains the core innovation, projects timing from the Bass model output, names specific industries that will be affected, and identifies the key risks emerging from the research itself.

We also generate a "What if?" hook for each cluster — a single provocative question that captures the most consequential implication of the research. These are designed to make you stop and think, not to alarm or sensationalize.

Common questions

How often does ClusterFunk update?

The pipeline runs monthly. Each monthly run fetches new papers published since the prior update, re-clusters everything from scratch using the full historical corpus, re-fits the Bass models, and regenerates the narratives. On first setup a backfill run fetches up to five years of history per domain to give the Bass model enough data to make confident projections. This means adoption stage and velocity numbers improve over time as coverage deepens.

Can I see what changed month to month?

That is on the roadmap. We are building a "What's Changed" view that will show you, for each topic, which signals are new this month, which moved meaningfully on the adoption curve, and which convergences appeared or strengthened since last month. The Paper Coverage range shown on each topic page already reflects the actual span of data feeding the model — this will grow as monthly runs accumulate.

How do you decide which topics to track?

Topics are selected based on research volume (there need to be enough papers to cluster meaningfully), strategic importance (aligned with frameworks used by leading technology foresight organizations), and cross-topic convergence potential. We currently track 27 domains across AI, robotics, computing, life sciences, energy, and communications.

How accurate is the Bass model for projecting adoption?

The Bass model is highly accurate for technologies with at least two to three years of publication history. For very new fields (under 12 months of data), the model fit is weak and we flag this with a low-confidence indicator. We always show R² on every adoption chart so you can judge the quality yourself.

Could the clustering group unrelated papers together?

Yes, occasionally. The clustering is statistical, not curated. If a word or phrase appears in two otherwise unrelated research areas, papers from both may end up in the same cluster. We use several quality filters — minimum cluster size, minimum paper density — but some noise is inevitable. The cluster name, thesis, and key papers are the best way to sanity-check whether a cluster is coherent.

Why peer-reviewed research and not patents, news, or job postings?

Patents lag research by two to four years and are strategically obscured. News covers what companies want to announce. Job postings reflect current deployment, not future direction. Peer-reviewed research is the primary record of what scientists believe is real and important — before the commercial layer arrives.

← Back to Trend Radar