What This Shows: The Pipeline Funnel visualizes how articles flow through our data collection process-from initial search results through deduplication, text extraction, AI evaluation, and final acceptance.
Why It Matters: This transparency shows exactly how we curate our corpus. Each stage removes articles that don't meet criteria: duplicates are merged, inaccessible articles filtered out, and only those passing rigorous evaluation are included.
How to Interpret: The narrowing flow shows selectivity at each stage. Percentages between stages show conversion rates. The overall yield indicates what fraction of initial results make it into our curated corpus.