When P-values Hit the Floor: Handling Astronomically Small Statistics in QQ Plots

There is a number that sits quietly at the bottom of every R session, rarely visited:

.Machine$double.xmin
# [1] 2.225074e-308

For most statistical work, this floor is purely theoretical, something you would never expect to bump into. A p-value of 10⁻³⁰⁰ is not just statistically significant; it’s so far beyond any useful threshold that you might reasonably wonder why it matters whether the software handles it correctly.

And yet, here we are.

The QQ plot: a diagnostic workhorse

The quantile-quantile (QQ) plot is one of the most important diagnostic tools in statistical genetics. If your test statistics follow their null distribution, plotting observed quantiles against expected quantiles should give a straight diagonal line. Deviations from that line, the upward sweep at the right tail that geneticists call “inflation”, tell you something real is happening: true signal, population stratification, or model misspecification.

For GWAS, the QQ plot is as routine as checking residuals in a regression. You run your association tests across millions of genetic variants, collect the p-values, and plot them. A well-behaved null looks like a line. An inflated study looks like a ski jump.

The problem is that modern GWAS has grown rather large.

The scale problem

When I first started making these plots at deCODE, studies were already big, tens of millions of variants across the Icelandic population. Today, biobank-scale studies test tens of millions of variants in hundreds of thousands (sometimes over a million) participants. The UK Biobank, FinnGen, the Estonian Biobank, deCODE. The difference from earlier cohorts isn’t just scale, it’s what becomes statistically detectable.

This creates a practical problem. How do you render a QQ plot for 100 million p-values?

The naive answer (hand them to your plotting library and let it draw a point for each) runs into trouble fast. The qqman R package, the long-standing standard for GWAS visualization, takes over 13 minutes to render a single plot at that scale. If you’re saving to a vector format, the file stores every single point and becomes absurdly large. What should be a 5-second sanity check turns into a 13-minute wait.

I decided to take the matter into my own hands. Looking at the QQ plots, it was obvious that most of those points are redundant. They’re so close together at the dense lower end of the distribution that removing them has no visible effect on the plot. I implemented the filtering in C++ using Rcpp and got an 80× speedup for 100 million points, down from 13 minutes to under 10 seconds. That became fastqq.

A new problem: the extreme tail

That fixed the speed issue. But when we started running meta-analyses, combining results across multiple large biobank cohorts, I ran into something I hadn’t anticipated.

Consider what happens when you combine GWAS results across, say, ten cohorts totaling 2 million participants for a highly heritable trait with a common variant of large effect. The resulting p-values at the top hits aren’t small in the usual GWAS sense (10⁻⁸, the genome-wide significance threshold). They can be astronomically small. Values of 10⁻¹⁰⁰, 10⁻²⁰⁰, even 10⁻⁵⁰⁰ are not hypothetical; they’re the mathematically correct p-values that fall out of combining enough evidence.

R’s double-precision floating point can’t represent these. Anything below about 2.2×10⁻³⁰⁸ underflows to zero. Not flagged as missing or infinite. 0. And fastqq, like any software working on the raw p-value scale, had no idea what to do with a zero. The plotting would either error out or drop the point.

The most extreme signals in a meta-analysis, exactly the ones you most want to see, were disappearing from the diagnostic plot.

What I added

I added three things to handle this.

1. Explicit zero handling in `qq`

The simplest case: you have raw p-values and some of them have underflowed to exactly zero in the software that generated them. The original association tool hit the floor internally; the true value is small but finite.

For this, qq now accepts a zero_action parameter:

pvec_with_zeros <- c(runif(1e4), 0, 0)
qq(pvec_with_zeros, zero_action = 1e-300)
# Warning: 2 p-value(s) equal to zero replaced with 1e-300.

You have to explicitly pick the replacement value (which forces you to think about what it means) and the function warns loudly when it does the substitution, including how many points were affected. Silent imputation would be worse than a warning.

2. The `qqlog` function: pre-transformed input

A more principled approach is to compute −log₁₀(p) before the values pass through R’s floating-point representation, using extended-precision arithmetic, and then hand those transformed values directly to the plotting function.

# Compute -log10(p) in extended precision via Rmpfr or similar,
# then plot directly:
fastqq::qqlog(my_precomputed_neglog10_pvalues)

qqlog has the same signature as qq, but accepts −log₁₀(p) as input. It uses the same fast filtering internally, so performance is unchanged. Some association testing tools already output −log₁₀(p) directly rather than raw p-values precisely because the raw values would underflow. qqlog accepts that output directly.

3. The `qqchisq1` function: log-space conversion

The cleanest solution: accept the test statistic directly and convert to −log₁₀(p) internally using log-space arithmetic, so the intermediate p-value never needs to exist as a double.

fastqq::qqchisq1(my_chisq_statistics)

R’s pchisq accepts a log.p = TRUE argument that returns log(p) instead of p. From there:

$-\log_{10}(p) = \frac{-\log(p)}{\log(10)}$

This never computes p itself, only its logarithm, which stays a perfectly ordinary floating-point number even when the p-value it encodes would be 10⁻⁵⁰⁰. A χ² statistic of about 2303 (on 1 degree of freedom) corresponds to a p-value around 10⁻⁵⁰⁰, far below R’s double-precision floor. Naively, pchisq(2303, 1, lower.tail = FALSE) returns 0. Via log-space, it gives the correct −log₁₀(p) of about 502:

-pchisq(2303, 1, lower.tail = FALSE, log.p = TRUE) / log(10)
# [1] 501.87

qqchisq1 uses this internally, so that variant shows up correctly at the top of the QQ plot, with no extra steps from the user and no loss of information.

Why this matters more now

Mega-cohort meta-analysis in complex trait genetics isn’t slowing down. Efforts like the Global Biobank Meta-analysis Initiative are combining data across dozens of biobanks spanning millions of participants. For common, highly heritable phenotypes (height, BMI, educational attainment, blood pressure) the top signals will routinely produce test statistics that break double-precision arithmetic.

It’s a calibration issue, not a fundamental one. But tooling built for the GWAS scale of ten years ago is inadequate for today’s meta-analyses. A QQ plot that drops the most extreme points hides the part of the distribution you care most about.

The fix is simple: work in log-space, never materialize the raw p-value, and let the analyst supply pre-transformed values when their pipeline handles the arithmetic. The failure mode was the interesting part: zero underflow is easy to miss, easy to confuse with real zeros, and it matters because the QQ plot is the first thing people check.

Getting started

fastqq is available on CRAN and GitHub:

# CRAN
install.packages("fastqq")

# Development version
devtools::install_github("gumeo/fastqq")

If you’re working with meta-analysis summary statistics where extreme p-values show up, qqchisq1 is probably what you want if you have χ² statistics, or qqlog if your pipeline already produces −log₁₀(p). For standard GWAS p-values with occasional zeros from software underflow, qq(..., zero_action = 1e-300) handles the common case.

The full documentation, examples, and source code are at github.com/gumeo/fastqq.

fastqq is an open-source R package released under GPL-3.0.