5 Urban Legends of CNS Clinical Trial Methodology: Unsuccessful Solutions to the Problem of Failed Trials
Janet B. W. Williams, PhD; Danielle Popp, PhD; Scott Reines, MD, PhD; and Michael J. Detke, MD, PhD
From the College of Physicians and Surgeons, Columbia University, New York, New York (Dr Williams); MedAvante, Inc, Hamilton, New Jersey (Drs Popp and Reines); and Indiana University School of Medicine, Indianapolis (Dr Detke).
This poster presentation was supported by MedAvante, Inc.
Introduction: As the rate of failed trials in CNS has grown, drug developers have attempted strategies to improve signal detection and reduce failures. We present 5 common strategies and evaluate their effectiveness.
1. Increasing sample. If statistical power increases with sample size and effect size is fixed, it appears reasonable that increasing sample size will increase effect size. Liu et al. examined 4 depression trials to evaluate this theory.
2. Choosing “proven” sites. Some believe that selecting sites with proven effectiveness across several studies will continue to yield positive results. Gelwicks et al. analyzed data from sites that participated in at least 2 trials with at least 30 subjects.
3. Using experienced raters. It seems logical that more experienced raters will minimize variability and improve signal detection. Kobak et al. examined interrater agreement across 3 cohorts of raters: experienced and calibrated, experienced but non-calibrated, and inexperienced.
4. Increasing training. Variability across raters in a trial negatively affects study power and signal detection. To reduce variability, Demitrack et al. trained raters in an intensive session with videotapes and discussion.
5. Using certain regions. Many believe greater signal detection can be obtained outside the US. Khin et al. conducted a meta-analysis of FDA data on 81 US and ex-US antidepressant trials.
1. Increasing sample. In 3 positive studies, treatment effect was observed before the first 100 subjects per treatment arm were enrolled. Treatment effect size decreased over time despite increases in sample size.
2. Choosing “proven” sites. Site performance across consecutive studies was inconsistent (all correlations <.50).
3. Using experienced raters. Calibration appears to improve reliability over and above experience alone. Experienced and calibrated raters had the highest ICC (.93) whereas experienced and non-calibrated raters had the lowest ICC (.55).
4. Increasing training. ICCs did not improve across 6 hours of training.
5. Using certain regions. Analysis revealed increasing placebo response across US and ex-US regions, and a similar decrease in US and ex-US effect size.
Conclusions: Strategies for improving signal detection are often used, despite a lack of clear evidence of their effectiveness. These “urban legends” are widely touted, but evidence to support them is mixed at best.
Understand methods used to improve signal detection
See reasons why some are ineffective
Kobak KA, Brown B, Sharp I, et al. Sources of unreliability in depression ratings. J Clin Psychopharm. 2009;29(1),82–85. PubMed
Liu KS, Snavely DB, Ball WA, et al. Is bigger better for depression trials? J Psychiatr Res. 2008;42(8):622–630. PubMed