Skip to content
Academic Research10 min read·

Clinical Trial Data for Investors: How to Read Phase 2 and Phase 3 Results

P-values, hazard ratios, confidence intervals — clinical trial readouts are the most important events in biotech investing. Here's how to read them without a PhD.

Share

Why clinical trial results move stocks 50%+

In no other sector does a single data point move a stock price by 50% or more in a single day. In biotech, it happens routinely. A Phase 3 trial readout can double a stock overnight or cut it in half — and the difference between those outcomes often comes down to a handful of statistical metrics that most retail investors do not know how to read. The reason for these extreme moves is the binary nature of drug development. A drug either works or it does not. The FDA either approves it or it does not. There is very little middle ground. Before a clinical trial reads out, the market is pricing in some probability of success — typically reflected in the stock price as a risk-adjusted net present value of future revenue. When the data arrives, that probability snaps to near-certainty (positive data) or near-zero (failure), and the stock price adjusts accordingly. Phase 3 success rates are approximately 50-60% depending on the therapeutic area, according to analysis by BIO, QLS Advisors, and Informa. This means that for a stock pricing in, say, a 40% probability of success, a positive Phase 3 readout can easily produce a 50-100% move as the market reprices the asset. Conversely, failure at Phase 3 — after hundreds of millions of dollars in development costs — typically results in a 50-80% decline. Understanding how to quickly read clinical trial results gives you an enormous edge. Most retail investors rely on the company's press release narrative. By understanding the actual statistics, you can form an independent view within minutes of a data release — before the market has fully digested the implications.

Endpoints: what the trial is actually measuring

Every clinical trial has a primary endpoint — the specific measurement that determines whether the trial succeeds or fails. The FDA's approval decision hinges primarily on whether the drug met its primary endpoint with statistical significance. Understanding what that endpoint is, and why it was chosen, is the first step in evaluating any readout. Overall Survival (OS) is the gold standard. It measures whether patients taking the drug live longer than those in the control group. OS is unambiguous and clinically meaningful, but it takes years to mature and is expensive to measure. It is most common in oncology trials. Progression-Free Survival (PFS) measures how long patients live without their disease getting worse. PFS is faster to measure than OS and is widely accepted as a primary endpoint in oncology. However, PFS does not always translate to OS benefit, which is why the market sometimes reacts cautiously to PFS-only data. Objective Response Rate (ORR) measures the percentage of patients whose tumors shrink by a predefined amount. ORR is used in single-arm trials (no control group) and can support accelerated approval, but it is a weaker endpoint than PFS or OS. Pathological Complete Response (pCR) is used in neoadjuvant (pre-surgery) cancer trials. It measures whether there is any detectable cancer remaining after treatment. High pCR rates are strongly correlated with long-term outcomes and can support accelerated approval. The choice of primary endpoint matters for stock impact. An OS benefit is the most convincing and typically generates the largest positive reaction. A PFS benefit without OS data leaves uncertainty. An ORR-based approval is seen as preliminary. Always identify the primary endpoint before evaluating the statistics.

P-values and statistical significance

The p-value is the most widely cited statistic in clinical trial readouts, and also the most widely misunderstood. A p-value answers a specific question: if the drug had no real effect, what is the probability of observing results at least this extreme by chance alone? The conventional threshold for statistical significance is p < 0.05, meaning there is less than a 5% probability that the observed results are due to chance. The FDA generally requires p < 0.05 for approval, though some trials use more stringent thresholds (p < 0.01 or p < 0.001) depending on the regulatory context. Here is what the market cares about in practice: - p < 0.001 — Highly significant. The trial is an unambiguous success. Expect the largest positive stock reactions here. - p < 0.01 — Very significant. Strong data that leaves little room for regulatory objection. - p < 0.05 but > 0.01 — Statistically significant but less robust. The market may react positively but with some reservation, especially if the p-value is close to 0.05. - p = 0.049 vs. p = 0.051 — These results are statistically almost identical, but the market treats them as categorically different. A trial that "meets" with p = 0.049 is a success. A trial that "misses" with p = 0.051 is a failure. This binary interpretation creates some of the most dramatic single-day moves in biotech. Common misconceptions: a p-value is not the probability that the drug works. It is not the probability that the result is "real." It does not tell you the magnitude of the drug's effect. A highly significant p-value (p < 0.001) with a tiny effect size may not be clinically meaningful. Always pair the p-value with the effect size (hazard ratio, response rate difference, etc.) to assess whether the result matters for patients and, by extension, for the drug's commercial potential.

Hazard ratios and confidence intervals

In survival trials (OS, PFS), the hazard ratio (HR) is the key measure of effect size. It tells you how the risk of an event (death or disease progression) compares between the treatment group and the control group. The interpretation is straightforward: - HR < 1.0 means the drug reduces the risk relative to control. HR = 0.70 means the drug reduced the risk of death or progression by 30%. - HR = 1.0 means there is no difference between drug and control. - HR > 1.0 means the drug is worse than control. The lower the hazard ratio, the better the drug performed. In oncology, an HR of 0.50-0.70 for OS is generally considered a strong result. An HR of 0.80-0.90 for PFS may be statistically significant but is a more modest benefit. The confidence interval (CI) tells you the range of plausible values for the true hazard ratio. A 95% CI that does not cross 1.0 means the result is statistically significant at the p < 0.05 level. For example: - HR = 0.65, 95% CI: 0.50-0.85 — Strong result. The CI is entirely below 1.0 and the range is relatively tight. - HR = 0.82, 95% CI: 0.67-1.01 — The CI crosses 1.0, meaning the result is not statistically significant. Even though the point estimate (0.82) suggests benefit, the data is consistent with no effect. Quick assessment rule: Look at the HR first (lower is better), then check if the CI crosses 1.0 (it should not). A tight CI far from 1.0 is the best-case scenario. A wide CI that just barely excludes 1.0 suggests the result is fragile and may not replicate. The market rewards convincing hazard ratios with premiums and penalizes borderline results with skepticism.

Catalyst Calendar: see the live data

PDUFA dates, Phase 2/3 readouts, and AdCom meetings — all tracked

Explore now →

Subgroup analysis: the double-edged sword

After a trial reports its primary endpoint results, the next section of the press release typically presents subgroup analyses — results broken down by patient demographics, disease characteristics, or biomarker status. Subgroup analysis is where most investors get misled. The fundamental problem is multiplicity. If you test enough subgroups, some will show positive results by chance alone. A trial with 20 pre-defined subgroups will, on average, produce one "statistically significant" subgroup finding even if the drug has no real effect — simply because 1/20 = 5%, which is the p < 0.05 threshold. The FDA distinguishes between pre-specified subgroups (defined in the trial protocol before data collection) and post-hoc subgroups (identified after looking at the data). Pre-specified subgroups are given more regulatory weight because they were defined without knowledge of the results. Post-hoc subgroups are viewed with deep skepticism because they are vulnerable to "data dredging" — finding patterns that are artifacts of random variation. Red flags in subgroup analysis: - A trial misses its primary endpoint in the overall population but "shows benefit" in a specific subgroup. This is the classic data-dredging pattern and rarely leads to approval without a confirmatory trial. - The subgroup benefit is driven by a very small number of patients. Small subgroups produce unstable statistics. - The subgroup was not pre-specified in the trial protocol. The company may be fishing for positive results. Legitimate subgroup findings occur when a biomarker-defined population (e.g., PD-L1 high, BRCA-mutated) shows enhanced benefit and the subgroup was pre-specified. In these cases, the FDA may grant approval for the biomarker-positive population even if the overall population result is mixed. Keytruda's initial approval in PD-L1 positive non-small cell lung cancer is a prominent example of this pathway.

What to look for in 60 seconds

When a biotech company reports clinical trial results, the press release typically drops before or after market hours. You have a limited window to assess the data before the stock moves. Here is a 60-second checklist: 1. Did the trial meet its primary endpoint? This is always stated explicitly in the first paragraph of the press release. "Met" = positive. "Did not meet" = negative. Everything else is secondary. 2. What is the p-value? Look for p < 0.05 at minimum. p < 0.01 is strong. p < 0.001 is exceptional. If the press release does not disclose the p-value, that is a red flag — companies almost always highlight strong p-values. 3. What is the hazard ratio (for survival endpoints)? HR < 1.0 with a CI that does not cross 1.0 confirms statistical significance. HR below 0.70 is generally a strong effect. 4. Are there safety signals? Look for language about treatment-related serious adverse events, dose modifications, or treatment discontinuations. The FDA can reject a drug with strong efficacy if the safety profile is unacceptable. 5. What does management say about next steps? "Plan to file an NDA/BLA" or "will submit to FDA" signals confidence. "Will conduct an additional study" or "exploring options" signals weakness. 6. Check the secondary endpoints. Do they support the primary finding? Concordant secondary endpoints (e.g., OS trend supporting PFS primary) strengthen the case. Discordant secondaries (e.g., no OS trend despite PFS benefit) raise questions. This checklist will not make you a clinical trial expert, but it will let you form a rapid, independent assessment of a data readout before relying on analyst commentary or social media sentiment. In biotech investing, the ability to read the data yourself — even at a basic level — is a meaningful edge. *This article is for educational purposes only and does not constitute investment advice. Clinical trial outcomes are inherently uncertain, and stock price reactions to data readouts can be unpredictable. Always conduct your own due diligence.*

Key terms in this article

Explore the data behind this research

See these signals live on BiotechEdge — updated daily from SEC filings and clinical trial data.

Get the full picture

AI context on every signal, Monday digest in your inbox, and alerts when funds touch your tickers.

Start 14-Day Free Trial

$29/mo after trial · No credit card to start

More data-driven analysis, every Monday: