Antidepressants: Conundrums and Complexities of Efficacy Studies

RECENT CONCERNS It is by now a truism in the lay press that “antidepressants are no more effective than sugar pills.” Although the best available evidence does not support this conclusion, public skepticism regarding the efficacy of antidepressants is understandable, given several recent developments. Furthermore, concerns over possible adverse effects of antidepressants, including increased risk of “suicidality” in younger populations, have continued to darken the public's perception of these agents. Recently, these controversies came to a head after a reanalysis of SmithKline Beecham's randomized, double-blind Study 329, comparing paroxetine and imipramine with placebo in adolescents with unipolar major depression. The reanalysis, which used previously confidential documents, concluded that—contrary to the original findings—“...neither paroxetine nor high dose imipramine showed efficacy for major depression in adolescents, and there was an increase in harms with both drugs.” The “harms” included clinically significant increases in suicidal ideation and behavior. In an editorial accompanying the BMJ paper, Dr David Henry commented, “It's not clear whether it was deliberate or accidental, but [the original report] wrongly gave the impression that an antidepressant drug was effective and safe in children and adolescents.” Furthermore, the issue of “ghost writing” of antidepressant studies and/or heavy involvement of pharmaceutical industry authors has received renewed scrutiny. One recent review concluded that “There is a massive production of meta-analyses of antidepressants for depression authored by or linked to the industry, and they almost never report any caveats about antidepressants in their abstracts.” In addition, the issue of publication bias in the antidepressant literature remains a thorny problem. Thus, Turner et al compared 74 Food andDrugAdministration–registered antidepressant trials submitted for regulatory approval (for 12 antidepressants involving 12,564 patients) with the published literature. They found evidence of the “file drawer effect,” that is, publication bias in favor of positive studies. In the backdrop of these findings are some influential meta-analyses suggesting that the efficacy of antidepressants for major depression is exaggerated and/or limited to severe depression. Taken in toto, these revelations can only deepen the public's concern—if not cynicism—regarding both the validity of antidepressant studies and possible harms from these agents. Yet, amidst the flurry of media reports on Study 329, the many nuances of antidepressant research were often obscured, and the “bigger picture” was generally missed. For despite the apparent failure of Study 329, the preponderance of research still supports modest-to-moderate efficacy—and overall safety—for several commonly used antidepressants.


BROADER CONCERNS AND COMPLEXITIES
That said, there are still many theoretical and practical problems in the way antidepressant studies are carried out and interpreted; moreover, questions remain as to the degree of major depression (mild, moderate, or severe) for which antidepressants are effective. Underlying these issues are even broader concerns, such as the marked heterogeneity in the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-V) construct of "major depressive disorder" (MDD), 13 the dissimilarity between research subjects in antidepressant registration trials and "real-world" patients, 14 problems and limitations inherent in all meta-analyses, 15 and rising rates of the placebo response in studies of depression conducted in the past few decades. 16 To complicate matters further: the outcome of antidepressant studies may depend critically on what "level" of data is assessed. For example, based on the Hamilton Depression Scale (HAMD), Gibbons et al 11 (2012) have pointed to the difference between average study-level initial severity and antidepressant response, on the one hand, and patient-level data, on the other. These researchers argue that "…relatively small overall mean differences can translate into relatively large patient-level differences in clinically interpretable and meaningful endpoints such as response and remission." Thus, Gibbons et al 11 recently examined the short-term efficacy of antidepressants for treating major depression in youth, adults, and geriatric populations. The authors carried out a reanalysis of all intent-to-treat person-level longitudinal data during the first 6 weeks of treatment of MDD. Data were derived from both published and unpublished studies, conducted by the manufacturers of fluoxetine and venlafaxine. These included 20 randomized, placebo-controlled trials of fluoxetine (12 adult, 4 geriatric, and 4 youth) and 21 adult trials of venlafaxine. Complete longitudinal patient records were obtained, allowing the authors to examine associations between treatment response and baseline severity measured at the patient level. The study found that patients in all age and drug groups had significantly greater improvement relative to placebo controls, and baseline severity of depression did not affect symptom reduction. The authors concluded that "…The results do not support previous findings that antidepressants show little benefit except for severe depression. The antidepressants fluoxetine and venlafaxine are efficacious for major depression, in all age groups although more so in youth and adults compared with geriatric patients. Baseline severity was not significantly related to degree of treatment advantage over placebo." 11 In a companion article, Gibbons et al 12 examined suicidal thoughts and behavior in the same set of randomized, placebocontrolled studies of fluoxetine and venlafaxine and again used longitudinal patient-level data. The suicide items from the Children's Depression Rating Scale-Revised and the HAMD, as well as adverse event reports of suicide attempts and suicide during active treatment, were analyzed in 9185 patients. The study found that "Fluoxetine and venlafaxine decreased suicidal thoughts and behavior for adult and geriatric patients. This protective effect is mediated by decreases in depressive symptoms with treatment. For youths, no significant effects of treatment on suicidal thoughts and behavior were found, although depression responded to treatment. No evidence of increased suicide risk was observed in youths receiving active medication." 12 The use of longitudinal patient-level data is a genuine strength of the Gibbons et al studies. However, both studies by Gibbons et al came in for withering criticism in subsequent letters to the editor, 17,18 followed by vigorous rejoinders by Gibbons et al. A complete discussion of these critiques is beyond the scope of this editorial, but a few points are worth noting. For example, inclusion of the LYAQ fluoxetine trial in the Gibbons et al data-which looked at subjects with comorbid attention-deficit/hyperactivity disorder and depression-may have distorted the aggregate analysis because 19% of LYAQ subjects did not have depression. 18 There is also controversy over the use of rating scales (such as the Children's Depression Rating Scale-Revised) as a measure of suicidality, as opposed to spontaneous reports of suicidal thoughts or behaviors.

OTHER STUDIES SUPPORTING ANTIDEPRESSANT EFFICACY AND SAFETY
While acknowledging shortcomings in the 2 studies by Gibbons et al, it is important to note that the results of these studies are largely consistent with several other recent analyses. For example, Vöhringer and Ghaemi 3 conducted their own reanalysis of the US Food and Drug Administration database MDD studies specifically analyzed by Kirsch et al. The reanalysis corrected for a statistical "floor effect" so that relative (instead of absolute) effect size differences were calculated; that is, drug-placebo differences were adjusted for baseline severity of illness. This led to an increase in nonstandardized effect size from 0.32 (as per Kirsch et al) to 0.40. Contrary to Kirsch et al (2008)-who found antidepressants effective only in the most severely depressed patients-the Vöhringer and Ghaemi 3 reanalysis found that "…antidepressants are effective in acute depressive episodes that are moderate to severe…," although not in mild depressive episodes.
In partial contrast, Stewart et al 19 analyzed 6 placebocontrolled antidepressant studies of patients with nonsevere MDD (HAMD score < 23) and found that "mild-moderate MDD can benefit from antidepressants," with the number needed to treat in the range of 3 to 8 (number needed to treat < 10 is considered clinically significant). It seems fair to conclude that the effectiveness of antidepressants for "mild" cases of major depression is still unclear-but there is little doubt that antidepressants are effective acutely in moderate-to-severe major depression.
Recently, Thorlund et al 20 conducted a meta-analysis of selective serotonin reuptake inhibitors and serotonin-norepinephrine reuptake inhibitors in adults 60 years and older, using data from 15 randomized, controlled trials. With respect to achieving a partial response, the authors found "…clear evidence of the effectiveness of sertraline, paroxetine, and duloxetine" in this study population.
Furthermore, although the issue of "suicidality" in younger populations treated with antidepressants remains controversial, 2 recent analyses suggest a "neutral" effect of selective serotonin reuptake inhibitors on suicide risk in children and adolescents. 21,22 Thus, Ghaemi 21 concluded that "SRIs increase suicide risk in 1% of children, and lead to completed suicide in about 1 in 500, which is the same as their prevention rate. Their overall effect is probably neutral when benefits are weighed against harms." Similarly, Carroll 22 found "…evidence of equipoise between the therapeutic outcome of preventing suicide and any potential drugrelated provocation of suicide among adolescents treated for MDD with fluoxetine." Notably, a recent consensus conference on antidepressant safety highlighted the marked ambiguity in the term "suicidality"which can mean anything from suicidal ideation to a completed suicide. The consensus authors pointedly observed that "…no deaths from suicide were reported in any of the 24 pediatric [antidepressant] trials involving 4,582 patients." 23 Indeed, suicidal ideation per se is a poor predictor of completed suicide, 24 which remains a poorly studied phenomenon; for example, randomized controlled studies routinely exclude subjects at high risk for suicide. Finally, it is important to note that, in adult populations, there are no controlled studies demonstrating increased rates of completed suicides associated with newer antidepressants. 12

ARE WE HAMSTRUNG BY THE HAMD?
The HAMD (or HDRS) is so widely used in antidepressant research; it has become nearly synonymous with measures of antidepressant efficacy. (It has sometimes been said that most antidepressant studies are illustrations of how strongly antidepressants affect the HAMD, more so than how well they treat depression.) However, the HAMD itself is subject to variability, depending on the level of experience of the rater: for example, poorly trained raters tend to produce results that diminish the effect of the antidepressant. 25 Furthermore, as Bagby et al 26 (2004) have observed that "…many [HAMD] scale items are poor contributors to the measurement of depression severity; others have poor inter-rater and retest reliability." Indeed, much may depend on which version of the HAMD (HAMD17, HAMD21, HAMD24, etc) or which HAMD item cluster is used. In a key study using the HAMD6, Bech 4 (2010) reported on a "reallocation" of HAMD items, focusing on the 6 items measuring severity of clinical depression-depressed mood, guilt, work and interests, tiredness, anxiety, and psychomotor retardation. For second-generation antidepressants in placebocontrolled trials, application of the HAMD6 resulted in clinically significant effect sizes of 0.40 or greater. 4 That said, the HAMD is not necessarily the last word in measuring antidepressant response. Indeed, it is rare to find studies of antidepressant efficacy that examine "quality of life" (QOL) in study subjects, although arguably, it is precisely this factor that matters most to our patients. Two notable exceptions to the "HAMD world view" are the studies by Skevington and Wright 27 (2001) and Berlim et al 28 (2007). The Skevington and Wright study examined general practice patients with moderate depression (n = 106) by DSM-IV criteria. Subjects completed the 100-item World Health Organization Quality of Life Assessment (WHOQOL) and the Beck Depression Inventory, before the start of antidepressant treatment and 6 weeks afterward. Depression decreased significantly for 2 months, with 74% reported feeling better. The 100-item World Health Organization Quality of Life Assessment scores increased in 24 of the 25 facets, "…demonstrating that QOL improves significantly in the 8 weeks following the start of antidepressant treatment…" and that "…antidepressants significantly and comprehensively improve QOL" in this sample.
In a Brazilian study by Berlim et al, 28 73 patients presenting with a severe episode of major depression were assessed by the WHOQOL BREF (26 items derived from the 100-item WHOQOL) and the Beck Depression Inventory at the start of antidepressant treatment and again after a mean of 12 weeks. The depressed patients' QOL scores significantly improved in all the assessed domains (ie, physical health, psychological, social relations, environmental, and overall QOL) during the study period. Moreover, there was significant improvement in depressive symptoms between test and retest (effect sizes ranged from 0.49 to 1.08; ie, medium-to-large effects).
Although these are relatively small, uncontrolled studies, they suggest that QOL may be enhanced by antidepressant treatmentperhaps a more important clinical measure than HAMD scores.

OTHER PROBLEMS WITH ANTIDEPRESSANT STUDIES
As noted previously, 13 the DSM-IV/V construct of MDD is extraordinarily elastic and heterogeneous. Moreover, DSM-V field trials have revealed "questionable" reliability (kappa = 0.20-0.39) for the diagnosis of MDD. 29 Given that a patient with MDD may have had the requisite symptoms for as little as 2 weeks, or as long as 2 years; be able to work or not be able; and so forth, it is almost inevitable that the DSM criteria will capture a wide range of MDD types and severity. Most randomized, controlled studies of MDD do not attempt to distinguish between MDD symptoms present for, say, 2 versus 12 months. Nor do most studies look specifically at subjects with MDD who meet criteria for the specifier, "with melancholic features"-a more serious form of MDD that is poorly responsive to placebo, compared with nonmelancholic MDD. 30 Indeed, some have argued that "melancholia" (a related construct) ought to be considered a distinct mood disorder. 31 It seems axiomatic that the presence or absence of melancholic features will result in discrepant findings, with respect to drug versus placebo differences.
Finally, we have yet to solve the puzzle of increasing placebo response rates in more recent studies of major depression. (Contrary to much misrepresentation in the lay press, the placebo condition is much more than "a sugar pill" and usually includes 8 or more hours of weekly, supportive contact with professional staff. 32 ) As 1 review noted, "Since the response to placebo is variable, often substantial, and increasing, it is not surprising that in many randomized controlled trials the response associated with placebo is similar to that associated with an established antidepressant." 16 One possible factor in this trend is the changing demographics of research subjects assigned to the placebo condition. Bridge et al 33 (2009) hypothesized that, in some multisite studies of MDD, subjects with less severe depressive illness may have been recruited-which would likely inflate placebo response rates and diminish drugplacebo differences.

CONCLUSIONS
The preponderance of data from randomized, controlled studies suggests that modern antidepressants are modestly effective and generally safe, in the acute treatment of major depression. Notwithstanding these upbeat conclusions, many problems continue to complicate the research literature, including but not limited to publication bias, rising rates of the placebo response, marked heterogeneity in the construct of "MDD," and the exclusion of "real-world" patients (eg, those with comorbidities or high suicide risk) from randomized, controlled studies.
Going forward, we need to increase "transparency" in publication, by, for example, making raw study data available to journal reviewers. 34 We need to reexamine our entry criteria for antidepressant trials so that study subjects more closely resemble those we treat in clinical practice. We need to refine our recruitment methods so that highly placebo-responsive subjects are not included in randomized controlled studies. We must also address the marked heterogeneity in the DSM-V construct of MDD, via more fine-grained examination of MDD subgroups, for example, patients with chronic and melancholic major depressive episodes. Furthermore, with respect to antidepressant efficacy, we must move beyond the HAMD toward broader assessments of "QOL." Finally, we need to intensify our outreach and education efforts so that the important role of antidepressant treatment is better appreciated by the general public.

ACKNOWLEDGMENTS
This editorial grew out of discussions with Drs David Osser, Nassir Ghaemi, Peter Kramer, Bernard Carroll, Donald Klein, and other colleagues. The opinions expressed here, however, are solely those of the author, who would also like to thank the peer reviewers for their useful suggestions.

AUTHOR DISCLOSURE INFORMATION
The author declares no conflicts of interest.