Comparative efficacy of non-steroidal anti-inflammatory drugs in ankylosing spondylitis: a Bayesian network meta-analysis of clinical trials

Objective To compare the efficacy of 20 non-steroidal anti-inflammatory drugs (NSAIDs) in the short-term treatment of ankylosing spondylitis (AS). Methods We performed a systematic literature review of randomised controlled trials of NSAIDs in patients with active AS. We included trials that reported efficacy at 2–12 weeks. Efficacy outcomes were the change in pain score and change in the duration of morning stiffness. We also examined the number of adverse events. We used Bayesian network meta-analysis to compare effects directly and indirectly between drugs. Results We included 26 trials (66 treatment arms) of 20 NSAIDs with 3410 participants in the network meta-analysis. Fifty-eight per cent of trials had fewer than 50 participants. All 20 NSAIDs reduced pain more than placebo (standardised mean difference ranging from −0.65 to −2.2), with 15 NSAIDs significantly better than placebo. Etoricoxib was superior to celecoxib, ketoprofen and tenoxicam in pain reduction, but no other interdrug comparisons were significant. There were no significant differences among NSAIDs in decreases in the duration of morning stiffness or the likelihood of adverse events. Adverse events were uncommon in these short-term studies. In 16 trials that used NSAIDs at full doses, etoricoxib was superior to all but two other NSAIDs in pain reduction. Conclusions Etoricoxib was more effective in reducing pain in AS than some other NSAIDs, but there was otherwise insufficient evidence to conclude that any particular NSAID was more effective in the treatment of AS. Comparisons were limited by small studies.

Non-steroidal anti-inflammatory drugs (NSAIDs) form the cornerstone of pharmacological treatment of ankylosing spondylitis (AS). Currently, 11 NSAIDs are approved for the treatment of AS in Europe and 5 are approved in USA; additional NSAIDs are approved for other indications and are available for use. 1 A question that naturally arises when there are many treatment options is whether any particular NSAID is more effective and safer for the typical patient. In the past, phenylbutazone was considered the NSAID of choice for the treatment of AS, but it was supplanted by NSAIDs without similar risk of bone marrow suppression. 2 Indomethacin has also been favoured as a particularly effective NSAID in patients with AS, despite the lack of randomised controlled trials (RCTs) demonstrating superior efficacy. 3 4 More recently, cyclooxygenase-2 (COX-2) selective inhibitors, such as celecoxib and etoricoxib, have shown similar efficacy in AS with non-selective NSAIDs, including naproxen, 5 6 diclofenac 7 and ketoprofen, 8 with lower risk of gastrointestinal (GI) toxicity. Head-to-head RCTs comparing two or more NSAIDs have been performed in AS, but a comprehensive comparison among the most commonly used NSAIDs is not available. For example, indomethacin and piroxicam have not been compared with COX-2 inhibitors in any trials. Information on the relative efficacy and safety of different NSAIDs would aid clinical decision-making. In the absence of direct comparisons, indirect comparisons of two or more drugs can be made through a common comparator using Bayesian network meta-analysis. 9 We conducted a systematic literature review and meta-analysis of RCTs to assess the relative efficacy and safety of 20 NSAIDs in the short-term treatment of AS.

Literature search
The study protocol was registered at PROSPERO (registration number CRD42014014329). With the help of a medical informationist, we searched PubMed, EMBASE, Scopus and the Cochrane Database for published RCTs of NSAIDs in AS from 1 January 1960 to 31 December 2014 in all languages. Search terms are provided in online supplementary table S1. We also performed manual searches of reference lists of reviews and searched ClincalTrials.gov and clinicaltrialsregister.eu for RCTs. Two authors (RW and MMW) independently reviewed the search results for eligible studies. Disagreements were resolved by discussion.

Selection criteria
Studies were eligible if they were RCTs evaluating the efficacy and safety of an NSAID compared with placebo or a different NSAID in adults with active AS and reported outcomes at 2-12 weeks. AS was defined by the modified New York criteria; for trials conducted before 1984, we included trials that required radiographic sacroiliitis to establish a diagnosis of AS. To enhance homogeneity, we excluded trials of axial spondyloarthritis, unless a subgroup analysis of patients with AS was reported. We also excluded trials with concomitant use of other anti-inflammatory drugs, such as corticosteroids, aspirin, immunosuppressants or biologics. Finally, we excluded trials that did not report relevant efficacy outcomes or measures of variance for the outcomes.

Data extraction and assessment of bias
Data extraction was done independently by two authors (RW and MMW). We extracted aspects of the study design, participant characteristics and relevant outcomes. For cross-over trials, if a washout period was present before the cross-over and results of the two phases were reported separately, the results of the second phase were recorded as a separate study.
We analysed the change in total pain score and change in duration of morning stiffness as the efficacy outcomes. When the total pain score was not reported, we used the results for spinal pain, night pain, pain at rest and day pain, in this order of priority. We either abstracted the mean change score and its SD, or estimated the change from baseline and final scores. When only medians and ranges were reported, we imputed the means by standard methods. 10 Pain scores were assessed using 10 cm visual analogue scales, 5-point Likert scales or other numerical rating scales. To create a uniform scale for analysis, we converted mean differences to standardised mean differences, which are defined as the effect divided by the baseline variability. 11 Data on duration of morning stiffness were collected similarly. Because these data were always reported as minutes, we analysed mean differences for morning stiffness.
For safety outcomes, we examined the numbers of patients with any adverse event (total AE), GI adverse events and GI bleeding. We defined GI adverse events as any GI complaints, including nausea, vomiting, diarrhoea and epigastric or abdominal pain, but excluding haematemesis and haematochezia. In trials that reported the number of events, we used this to impute the number of affected patients.
We used the Cochrane Collaboration's tool for assessment of risk of bias to assess study quality. 10 We scored each study on six domains (sequence generation, allocation concealment, blinding, incomplete outcome data, selective reporting, other sources of bias) as high risk, low risk or unclear risk of bias. We considered a study as low risk for bias due to incomplete outcome data if either intention-to-treat analysis was performed or the loss to follow-up proportion was less than 10%.

Statistical analysis
To include cross-over and parallel trials in the analysis without breaking the randomisation of the cross-over design, we computed the relative effect between the two drugs in each crossover trial, while for parallel studies we considered the effect of each study arm in the trial. For trials that assessed more than one dose of an NSAID, we combined the effects of the two doses. Intention-to-treat data were used whenever available. For studies that did not report intention-to-treat analyses, we adjusted the end-of-follow-up values by assigning baseline values to those who dropped out.
We used Bayesian network random effect meta-analysis to synthesise the data for each outcome. This method allows the estimation of indirect effects between two drugs on the basis of observed direct effects based on the model of consistency, that is, the relative effect of drugs A and C is the difference of the relative effects of drugs A and B and drugs B and C if these are directly observed. The Bayesian model was constructed under this assumption for the set of drugs that form a connected network, that is, each drug is connected to every other drug in the set by either a direct or computable indirect effect given the trials included in the analysis. The models were optimised and estimates were obtained using Markov Chain Monte Carlo methods, with weighting for sample size. The analyses of pain and morning stiffness were performed under the model assumption of a normal (Gaussian) likelihood and conjugate priors, which were used to compute the posterior distribution of the effect of each NSAID compared with placebo (or another NSAID). 12 We presented these results as relative effect sizes of the two interventions with 95% credible intervals. We analysed safety outcomes using a Poisson model, where the rate of events (the hazard) was modelled accounting for study sample sizes. Results were presented as relative risks of the two interventions in comparison, in the form of log (HR). We assumed that the hazard did not depend on the duration of each study. All computations were done using R (V.3.1.2) package gemtc (V.0.6), 13 14 along with the Markov Chain Monte Carlo engine JAGS (V.3.4.0). 15 We performed three subgroup analyses to address trial heterogeneity: first excluding cross-over trials without an intervening washout period; second excluding trials without intention-totreat analysis; and third, stratifying by trial duration.
Some trials tested less than full daily doses of NSAIDs. Therefore, for the pain outcome, we also performed a subgroup analysis of trials that used full doses, including trials of celecoxib 400 mg daily, diclofenac 150 mg daily, etoricoxib 90 mg daily or more, indomethacin 100 mg daily or more, ketoprofen 200 mg daily or more, meloxicam 15 mg daily or more, naproxen 1000 mg daily and phenylbutazone 400 mg daily or more. For NSAIDs evaluated in a single dosage (eg, piroxicam), we included trials of the single dosage. Except for indomethacin, the full-dose analysis included trials with doses equal to or higher than '100' in the Assessment of Spondyloarthritis International Society (ASAS)-NSAIDs equivalent score. 16

Literature review
Of the 63 trials identified by literature review, we excluded 37, primarily because no relevant efficacy outcomes or measures of variation were reported (see online supplementary figure S1). We included 26 trials with 66 treatment arms and 3410 participants in the network meta-analysis (table 1). 5-8 17-38 Twenty different NSAIDs were compared in these trials. The sample sizes ranged from 19 to 611; 15 trials (58%) had fewer than 50 participants. Sixteen studies reported parallel controlled trials, and 10 were cross-over trials. Among the 10 cross-over trials, five trials clearly reported a washout period before the cross-over. An intention-to-treat analysis was reported in 10 trials (38%). The mean age of trial participants was 41.3 years and the mean duration of AS was 10.8 years.
The study quality was moderate to high (see online supplementary figure S2). Eight studies (30%) had high risk of bias due to incomplete outcome data, and three (11.5%) had high risk of bias in blinding.

Comparative effects on pain
Twenty-five trials (64 arms, 3370 participants) were included in the analysis of pain. Ninety per cent of baseline pain scores were between 37 and 78 (converted to a 0-100 scale) and did not differ among NSAIDs (see online supplementary figure S3). The network of treatment comparisons from these trials is shown in figure 1. All NSAIDs were numerically more efficacious than placebo in reducing pain severity, with effect sizes ranging from −0.65 to −2.2 (figure 2A). Fifteen NSAIDs were statistically superior to placebo, including etoricoxib, oxaprozin, diflunisal, isoxicam, phenylbutazone, feprazone, naproxen, indomethacin, tolmetin, piroxicam, meloxicam, diclofenac, sulindac, celecoxib and ketoprofen. In paired comparisons, †Cross-over trials that reported results of before and after second washout, the results after washout were recorded separately. CO, cross-over trials; ITT, intention-to-treat; MD, mean difference; P, parallel trials; SMD, standardised mean difference.
etoricoxib was significantly more effective than celecoxib, ketoprofen and tenoxicam, with a relative effect sizes of −1.08 (95% Credible Interval −2.14, −0.05), −1.27 (95% Credible Interval −2.46, −0.12) and −1.55 (95% Credible Interval −2.77, −0.36), respectively. Subgroup analyses of trials reporting intention-to-treat analyses and excluding cross-over trials without an appropriate washout had similar results (see online supplementary figures S4A and S4B). In the stratified analysis by trial duration, we compared seven NSAIDs assessed in shorter (≤6 weeks) and longer trials (≥8 weeks), including indomethacin, ketoprofen, piroxicam, tenoxicam, naproxen, diclofenac and celecoxib, and did not detect a significant difference in their relative efficacy (see online supplementary figure S5). The analysis of full-dose NSAID trials included 16 trials (39 study arms, 2530 participants, see online supplementary figure S6). We could not include diflunisal, fenoprofen or phenylbutazone due to lack of direct comparisons. All 14 NSAIDs in this analysis were numerically better than placebo in decreasing pain severity, with relative effect sizes ranging from −0.71 to −2.23, and 10 were significantly better than placebo (figure 2B). In paired comparisons, etoricoxib was the only medication that showed significant differences relative to other NSAIDs, and was more efficacious than all other NSAIDs except diclofenac and feprazone (see online supplementary table S2). While the relative effect estimates between etoricoxib and the other NSAIDs were similar in the main analysis and the full dose subgroup analysis, the credible intervals were narrower in the full-dose subgroup analysis, resulting in more significant differences between etoricoxib and the other NSAIDs.

Comparative effects on morning stiffness
Fifteen trials of 13 NSAIDs (39 arms, 1516 participants) were included (see online supplementary figure S7). Although all 13 NSAIDs reduced the duration of morning stiffness more than placebo, none of these reductions was statistically significant (figure 3).

Comparative risks of adverse events
The analysis of total AE included 25 trials of 19 NSAIDs, and the analysis of GI AE included 21 trials of 16 NSAIDs. All NSAIDs except fenoprofen had similar relative risks of total AEs, compared either with placebo (mean log (HR) from −0.69 to 1. 26) or with each other (figure 4A). No AEs were reported for fenoprofen in the one trial in which it was studied. Meloxicam was not included in this analysis due to missing data.
No NSAID except sulindac demonstrated a significant difference in the risk of GI AE compared with placebo (mean log (HR) from −0.41 to 2.39) or with other NSAIDs in these shortterm trials ( figure 4B). Sulindac had significantly higher risks of GI AE compared with placebo and celecoxib. However, it was assessed in one study and no GI AE was recorded for its direct comparator. Only 6 cases of GI bleeding were reported in 26 trials (diclofenac 1, indomethacin 1, ketoprofen 1, naproxen 2, phenylbutazone 1), too few for statistical analysis.

Summary of relative efficacy and safety
A summary of relative efficacy in pain reduction and risk of total AE of each NSAID compared with naproxen, a commonly used NSAID, is presented in figure 5. Although the associations are not significantly different, etoricoxib and oxaprozin appear to be more efficacious and have lower risk of total AE than naproxen, while indoprofen and sulindac tended to be less efficacious with a higher risk of AE.

DISCUSSION
NSAIDs are used regularly by 68% to 83% of patients with AS. 3 39-41 In the mid-1980s, indomethacin was the most commonly used NSAID among patients with AS in the UK, accounting for 35% of prescriptions. 4 In the mid-1990s in California, indomethacin was used by 25% of patients, 3 and in Germany and Austria in 2000, it was used by only 14% of patients. 40 Other commonly prescribed NSAIDs included naproxen, diclofenac, ibuprofen, piroxicam, and more recently etoricoxib, used by 7.6% of patients in Sweden. 4 41-43 Despite their wide use, few studies have examined the relative effectiveness of NSAIDs in AS. While one study reported that patients treated with indomethacin had greater subjective improvement than those treated with other NSAIDs, another study found no differences among NSAIDs in the likelihood of major symptom improvement (ie, at least 50% better). 4 41 We performed the current comparison to provide additional information on this question.
For pain reduction, we found that 15 NSAIDs were significantly better than placebo, and that etoricoxib was significantly better than celecoxib, ketoprofen and tenoxicam. There were no significant differences among NSAIDs in decreasing morning stiffness or in the risk of AE. When considering only the magnitude of effects, etoricoxib and oxaprozin seemed to be more efficacious in reducing pain and had a better short-term safety profile compared with naproxen. In contrast, results for indomethacin were similar to those for naproxen. However, these Relative effect size of each NSAID compared with placebo is represented as standardised mean difference (SMD) with 95% credible interval (CrI), and is listed in the right column. A negative value means greater reduction in pain compared with placebo. Figure 3 Change of duration of morning stiffness. Relative effect size of each non-steroidal anti-inflammatory drug (NSAID) compared with placebo is presented as mean difference (MD) with 95% credible interval (CrI), and is listed in the right column. A negative value means greater decrease in duration of morning stiffness compared with placebo.
differences should be interpreted cautiously. Etoricoxib and oxaprozin were each assessed in only one trial. Despite the rigorous literature search, it is possible that the results are affected by publication bias. Trials that did not demonstrate that an experimental drug was superior or comparable to a comparator such as indomethacin might be particularly susceptible to publication bias. Therefore, further studies of these medications may not confirm this result. Although we conducted a subgroup analysis of trials that used full dose NSAIDs, no studies tested indomethacin at 150 mg daily.
We chose pain severity and duration of morning stiffness as the efficacy outcomes because pain and stiffness are the most common symptoms in patients with AS. Pain and stiffness are common indications for NSAIDs and were often used as trial Results are presented as relative risk for each non-steroidal anti-inflammatory drug (NSAID) compared with placebo, in the form of log (HR) with 95% credible interval (CrI). A log (HR) of 0 implies no difference between the drug and placebo; a negative log (HR) implies the treatment has a lower risk of adverse events than placebo. Figure 5 Summary of relative efficacy and safety of non-steroidal anti-inflammatory drugs (NSAIDs). The y axis represents the relative efficacy in reducing pain compared with naproxen (arrow), measured by standardised mean difference (SMD) of the change in pain score; a negative value means greater reduction in pain. The x axis represents the relative risk for total adverse events compared with naproxen, measured by log (HR); a negative value means the treatment is safer. Therefore, medications in the lower left-hand quadrant are relatively more efficacious and have lower risk of adverse events than naproxen, while medications in the upper right-hand corner are relatively less efficacious and have higher risks of adverse events than naproxen.
end points. Only one study reported the Bath Ankylosing Spondylitis Disease Activity Index and four reported Bath Ankylosing Spondylitis Functional Index, so these measures could not be used in this study. We did not find that any NSAID significantly improved the duration of morning stiffness relative to placebo, even though the most efficacious medications reduced morning stiffness by approximately 1 h. This result likely reflects the poor sensitivity to change of this measure and the small samples. In contrast, significant differences relative to placebo were observed for pain, reflecting the better sensitivity of this measure.
Our study did not detect differences among NSAIDs in the risk of total AE or GI AE in these short-term trials, mainly because adverse events were uncommon. Studies with larger samples and longer durations are needed to provide better estimates. We did not assess cardiovascular side effects, which are important concerns given evidence of an increased risk of cardiovascular events in patients with AS. 44 45 In meta-analyses of RCTs in conditions other than AS, COX-2 selective inhibitors were found to have an increased risk for cardiovascular events compared with placebo. 46 In a recent population-based observational study of patients with AS, no differences in cardiovascular adverse events were detected among users of etoricoxib, celecoxib and non-selective NSAIDs, although selective prescribing based on pre-existing risks likely affected these comparisons. 43 These considerations should be weighed when recommending particular NSAIDs. This is the first study to use network meta-analysis to compare NSAIDs in AS. Network meta-analysis integrates direct evidence from RCTs and indirect evidence through a common comparator, 47 and therefore it allows a comprehensive comparison of all available interventions. In addition to the limitations associated with conventional pairwise meta-analysis, such as publication bias and trial heterogeneity, an attractive but often misunderstood product of network meta-analysis is the ranking of interventions based on probability output. 48 In our study, we chose to emphasise relative treatment effects, rather than reporting the ranks. The major limitations of our study are the number of available RCTs and small trial sizes. Ten NSAIDs were studied in a single trial, which may not have afforded a full assessment of these medications. However, the more commonly prescribed NSAIDs, including indomethacin, diclofenac, naproxen, piroxicam and celecoxib, were included in four to eight studies. More than half of the trials had fewer than 50 participants. This resulted in wide credible intervals and may have limited our ability to detect differences among NSAIDs. Further, we only included RCT evidence in this analysis. Observational studies may not have the same results.
In conclusion, in short-term clinical trials, we found no evidence to support differential efficacy among NSAIDs in the treatment of pain or morning stiffness in AS, with the exception of etoricoxib. Based on the results of one trial, etoricoxib was more efficacious in the treatment of pain than several other NSAIDs. Further studies, ideally a large multiarm trial, are needed to test the relative efficacy and safety of NSAIDs in AS. Our results suggest the particular medications that would be most promising to study.
Contributors MMW conceived and designed the study. RW and MMW collected the data, and AD did the analysis. RW drafted the manuscript and all authors provided critical review and approval of the final version.
Funding This study was supported by the Intramural Research Program, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health.
Competing interests None declared.
Ethics approval The study was exempted from human subjects review by the Office of Human Subjects Research, US National Institutes of Health.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement Data are in the public domain.