Comparison of different approaches for dose response analysis

Characterizing an appropriate dose‐response relationship and identifying the right dose in a clinical trial are two main goals of early drug‐development. MCP‐Mod is one of the pioneer approaches developed within the last 10 years that combines the modeling techniques with multiple comparison procedures to address the above goals in clinical drug development. The MCP‐Mod approach begins with a set of potential dose‐response models, tests for a significant dose‐response effect (proof of concept, PoC) using multiple linear contrasts tests and selects the “best” model among those with a significant contrast test. A disadvantage of the method is that the parameter values of the candidate models need to be fixed a priori for the contrasts tests. This may lead to a loss in power and unreliable model selection. For this reason, several variations of the MCP‐Mod approach and a hierarchical model selection approach have been suggested where the parameter values need not be fixed in the proof of concept testing step and can be estimated after the model selection step. This paper provides a numerical comparison of the different MCP‐Mod variants and the hierarchical model selection approach with regard to their ability of detecting the dose‐response trend, their potential to select the correct model and their accuracy in estimating the dose response shape and minimum effective dose. Additionally, as one of the approaches is based on two‐sided model comparisons only, we make it more consistent with the common goals of a PoC study, by extending it to one‐sided comparisons between the constant and alternative candidate models in the proof of concept step.


INTRODUCTION
A major concern in drug-development is characterization of the dose-response curve. Phase II dose finding studies often aim to detect a dose-response trend and simultaneously estimate the dose-response relationship. The goal of such Proof of Concept (PoC) studies is to determine whether to continue with the drug-development process and which dose should be selected for later stages.
Dose-finding studies were analyzed historically by two approaches: ( ) Multiple comparisons procedure (MCP) and ( ) Modeling Techniques. Under the MCP approach, each active dose is compared to placebo and PoC is established using multiple testing methods (Hothorn, Neuhäuser, & Koch, 1997;Stewart & Ruberg, 2000). Once the PoC is established, the minimum active dose with a statistically significant outcome and clinically relevant effect compared to the placebo is selected as the minimum effective dose (MED) (Ruberg, 1995;Tamhane, Hochberg, & Dunnett, 1996;Tamhane & Logan, 2002). Such procedures are relatively robust to the underlying dose-response relationship and are more effective when there are few active doses in the study. But they are not appropriate for characterizing the dose-response relationship. Dose-response inference is confined to the F I G U R E 1 Data generating shapes distinct doses administered in a given trial. The modeling approach on the other hand establishes PoC by comparing the flat dose-response model against the fitted dose-response model (Morgan, 1992;Pinheiro, Bretz, & Branson, 2006b). If the PoC is established, the underlying regression model is utilized for further inference on dose-response profiles. Figure 1 demonstrates the dose response models that are considered most relevant in the context of dose finding studies. The reliability of the modelbased approach depends on the correct specification of the dose-response regression model that is not known at the beginning of the trial.
Motivated by the shortcomings of the historical approaches, Bretz, Pinheiro, and Branson (2005) proposed an integrated approach namely the MCP-Mod (Multiple Comparison Procedures with Modeling) approach. This approach combines the multiple comparisons and modeling techniques and comprises two steps: (i) They start with a set of potential parametric dose-response models and conduct multiple linear contrast tests for testing a dose-related effect (PoC). (ii) Once the PoC is established, the best model (if any) is obtained from the set of models with significant contrasts either by model selection criteria or by model averaging. (iii) Upon selecting the best model, the adequate doses are estimated based on regression techniques. An implementation of MCP-Mod and its application is illustrated in Pinheiro, Bornkamp, and Bretz (2006a). MCP-Mod is recently qualified by the European Medicines Agency as an efficient statistical methodology for model-based design and analysis of phase II dosefinding studies under model uncertainty (EMA, 2014). However, a major deficiency of the MCP-Mod approach is that it makes prior assumptions about the values of the model parameters used in the candidate set. These unknown parameters, if wrongly specified, may lead to inefficient inferences in the dose-finding step.
In this article, we will discuss three dose-response profile estimation methods which address the shortcomings of MCP-Mod method. The first approach suggested by Dette, Titoff, Volgushev, and Bretz (2015) is a variation of the MCP-Mod approach that uses likelihood ratio tests instead of prespecified contrasts tests used in the original MCP-Mod. Thus this method does not need prior knowledge of the parameters of the candidate models. This is referred to as multiple comparison likelihood ratio asymptotic (MCLRa) method in the rest of the paper. The second approach proposed by Baayen, Hougaard, and Pipper (2015) compares and evaluates a candidate set of nested, plausible monotone dose-response shapes against the constant model of no effect and PoC is established if the constant model is rejected. The candidate models are also compared among each other via sequential testing procedure to come up with one final model which best fits the data under the given set up. This is referred to as multiple hierarchical modeling and comparison (MHMC) method in the rest of the paper. The MCLRa and MHMC method both use the asymptotic distribution of log-likelihood ratio statistics for their PoC testing. The third approach proposed by Gutjahr and Bornkamp (2017) considers the exact distribution of the LR test for the PoC testing step. Here, the author uses LR test statistics for testing the null hypothesis that there is no dose-response trend against the composite alternative that one of the candidate dose-response shapes is true. This is referred to as multiple comparison likelihood ratio exact (MCLRe) method in the rest of the paper. The MCLRa and MCLRe methods possess the property of testing the evidence of dose-response trend and estimation of target dose similar to MCP-Mod. Thus they are referred to as MCP-Mod variants in the rest of the paper. The three methods are compared in this paper amongst each other and with the original MCP-Mod approach with respect to their capability to: ( ) detect a nonzero dose-response trend, ( ) identify the correct model, ( ) estimate the dose-response shape, and ( ) estimate the target dose like the minimum effective dose (MED). The rest of the paper is organized as follows: The dose-finding methods compared in this paper are reviewed in Section 2. Section 3.1 describes the design of the simulation studies and Section 3.2 describes the performance metrics used in the comparison of the different MCP-Mod variants. The results of the analyses are presented in Section 3.3. The paper concludes with a discussion in Section 4.

Basic modeling framework
Consider a random vector containing the clinical measurement of interest for each of the patients. The response is observed for + 1 parallel dose groups and is assumed to follow a certain dose-response curve with normally distributed errors: where ( , ) denotes the regression function under model family and dose . 0 denotes the placebo dose and 1 , 2 , 3 , ....., are the active doses investigated in the parallel group design and = { , , ; , ∈ ℝ, ∈ Γ ⊆ ℝ −2 } refers to the vector of model parameters.
refers to the response of the th patient in the dose group where is the number of patients in dose group . Table 1 shows examples of linear or nonlinear transformations of the dose variable that can be considered in ( ).

Choice of candidate set
The methods discussed in this paper consider a set of parametric model families known as the candidate set of dose-response models to capture the model uncertainty more precisely. The candidate set is denoted by  = { ∶ = 1, 2, … , } and the mean response vector for the ℎ model family is denoted by = ( 0 , … , ) where is the mean response corresponding to dose group if is the correct model family. Table 1 shows few examples of parametric models in the candidate set used in the MCLRa and MCLRe method and Table 2 shows an example of a set of nested, monotone candidate models used in the MHMC method. Three different candidate sets are considered in this paper for evaluating the different methods; see Table 3. The motivation for choosing each candidate set is explained below:

Candidate Set 1:
In this comparison models with three parameters are only considered in the candidate set of MCP-Mod, MCLRa, and MCLRe method because PoC evaluation of the MCLRe method is time consuming for models with more parameters. The MHMC method is using the candidate set shown in Table 2. This candidate set was also used for the MHMC method in the original

Candidate Set 2:
In the second comparison, complex nonlinear models with more parameters like sigmodal Emax (sigEmax) model are also considered in the candidate set of MCP-Mod and MCLRa method. The objective is to check if there is any further information gained on widening the candidate set of the above methods. For the MHMC method, the candidate set remain the same as mentioned in Table 2. MCLRe is no longer considered in this comparison because the performance of MCLRe is similar to MCLRa. Additionally, the MCLRe method is more time consuming compared to the other methods (see Table 4).

Candidate Set 3:
In the third candidate set (see Table 3) the power model in Table 2 is replaced by the emax model for the MHMC method. By this, the candidate set of MHMC and MCLRa are becoming quite similar, as the four parameter logistic (4PL) model (see Table 2) and sigmoidal Emax model lead to the same class of dose-response models. Moreover the nested nature of the candidate set is preserved. This was already mentioned in Baayen et al. (2015), but not investigated. The linlog model is excluded from the candidate set of the MCPMod and MCLRa method since emax and linlog are very close to each other and the emax dose-response shape is more common in Phase II dose-response studies.

Method based on LR contrast test (MCLRa)
The method suggested by Dette, Titoff , and Volgushef is a variation of the MCP-Mod approach as already mentioned in the introduction. They use likelihood ratio test instead of the test used in the PoC testing step. This is addressed as LR contrast test in their paper (Dette et al., 2015). Given that the model specifications in the earlier section are true and the objective is to choose a model among the candidate models in , the hypothesis tested to detect the dose response signal is: for a given contrast = ( ,1 , ...., , ) ∈ ℝ . Let ‖.‖ 2 denote the Euclidean norm. The LR test statistic for testing the above hypothesis is given by: The LR test rejects the null hypothesis of no dose response for large values of the test statistics , . In order to conclude in favor of a statistically significant dose response signal, individual LR test statistics are combined into a maximum statistics: ( In classical likelihood theory, the statistic , usually converges weakly to a 2 distribution provided the parameters characterizing the null distribution are unique. But it is well known that classical asymptotic theory are not applicable when parameters are unidentifiable under null hypothesis of the above set up (Andrews & Ploberger, 1994;Chernoff, 1954;Davies, 1987;Lindsay, 1995). The authors have applied nonstandard asymptotic results (Liu & Shao, 2003) to address the nonidentifiability in the LR contrast tests. They have shown that the asymptotic distribution of the likelihood ratio test statistic (1) corresponding to each model can be approximated by a functional of a stochastic process. The covariance between the likelihood ratio of the two models can be approximated by the covariance between two stochastic process that has a closed kernel equation. The derivations are described in detail in the original manuscript (Dette et al., 2015). The resulting limiting quantile of the max LR statistic can be calculated as: is the limiting distribution of , .
Following the PoC testing step, the model selection in this method can be done based on P-value or some model selection criteria like Akaike Information Criteria (AIC). Model selection based on P-values is highly criticized in literature (Baayen et al., 2015) since models are not compared with each other when selection is done based on the smallest P-value, so model selection based on the AIC criteria is suggested in this paper. Thus if any significant result is obtained at the PoC testing step then the "best" model is selected from the set of significant models and this model is then deployed to obtain the target dose(s) (MED in this case) using inverse regression techniques.

Method based on LR exact test (MCLRe)
As discussed in the earlier section, the parametric models permissible in the candidate set of LR contrast test and LR exact test method are similar. The null hypothesis tested here is: and the one-sided alternatives ∈ 1 , ...., ∈ , where, for = 1, ..., , .., } The goal is to derive the likelihood-ratio test for 0 against . The above test can be simplified as 0, ∶ ∼  ( , 2 ) against 1, ∶ ∼  ( ( , , ), 2 ). The authors showed that LR test statistic used in the above testing simplifies to: where is the nonlinear parameter defined earlier and is the parameter space for . The rows of the ( − 1) × matrix form an orthonormal basis for the linear subspace = { ∈ ℝ ∶ 1 = 0}. The LR test rejects if is small enough or equivalently since is non-increasing, if is large enough. The authors started with the computation of P-value for single onesided alternative with single dose response model and extended their methods to multiple dose response models. For multiple alternatives with multiple dose response models the test statistics used is: he P-value of the LR test is given by 0 ( > ) where is the observed value of and 0 () is the probability calculated under the null hypothesis 0 . The authors have shown that the distribution of the test statistic for PoC can be determined using results from differential geometry. Further details are given in the original paper (Gutjahr & Bornkamp, 2017).
To the best of our knowledge, there exists only few literature on exact LR test involving nonlinear models. The approach used here can be used in applications beyond dose-response modeling. For instance, trend tests involving nonlinear models arise in many areas of applied sciences like change point analysis. One can apply similar approach like the MCLRe method to compute the distribution of the test statistic in the above scenario. This approach also provides a way to evaluate the power of the proposed test under alternative hypothesis, thereby enabling sample size calculations that are essential for clinical trials. The model selection and dose-response inference in this method is similar to the MCLRa approach.

Method based on hierarchical model testing (MHMC)
As shown in the earlier section the choice of candidate set for this method is not so flexible as the earlier approaches. This is because monotonicity and nestedness are needed for the candidate set used in this method. Table 2 gives an example of a candidate set of models that can be used in MHMC method. However, the candidate set is not fixed here. Any set of nested, monotone models can be considered as candidate set. The testing method sequentially starts with comparing the constant model to the other complex models in the candidate set and proceeds with the testing and comparison of more complex models in the subsequent testing steps: ..., starting with = 0 and going up to = − 1, where models are ordered according to their complexity.
The fixed sequence test mentioned above evaluates the PoC at the first step and if a significant effect is found then the process proceeds selecting the final model from the candidate set. The test statistics used are: with and the number of parameters in model and respectively and  and  are the log likelihood of the model and respectively, evaluated at the maximum likelihood estimates of the model parameters of the respective model. Note that, to account for over fitting, the above method penalizes for large differences in the number of model parameters by the choice of the denominator in the test statistics. The virtue of this method lies in the fact that both testing and model selection is done in the same statistical framework and model parameters are not specified at the beginning. Similar to the MCLRa approach, the author have also used nonstandard asymptotic results to obtain the asymptotic distribution of the log-likelihood ratio tests under nonidentifiability. They have presented in their original paper (Baayen et al., 2015) a characterization of the distribution of the log-likelihood ratio test statistic for two-sided PoC test under 0 , when the requirement = 0 renders the other nonlinear parameters unidentifiable. We have derived a similar characteristic distribution of log-likelihood ratio test statistics for one-sided PoC testing that is presented in Appendix A in the Supporting Information. The primary motivation behind this derivation is to make the PoC comparison consistent across all the methods. The authors have also considered an unrestricted model 4 shown in Table 2. They have shown in their analyses that inclusion of the unrestricted model protects the method against model misspecification and increases the power to detect the dose-response trend, if the true dose-response shape (a 4PL model or any model nested in 4PL model for instance) is not covered by the candidate set. However, we could not find sufficient evidence in support of the above claim in our simulations. Further, a non monotone dose-response shape is not very common in clinical trials so it might not be necessary to safeguard the method against a non-monotone shape. However, the unrestricted model 4 might be useful to increase robustness with regards to the assumed candidate models.

SIMULATION STUDIES
In this section, we present the simulation study for the comparison of the MCP-Mod variants and the MHMC method introduced in Section 2. We compare them with respect to their power to detect a nonflat dose-response shape (PoC). We also consider their ability to select the right model from the candidate set and study the model selection behavior when the true model belongs to the candidate set. Since the dose-response shape may be approximated sufficiently well even when the wrong model has been selected, we also consider the cumulative absolute deviation between the estimated and true model shape. We compare this prediction error to the error that we could have achieved if we would have known the right model family (with unknown parameter values) from the very beginning. Moreover, we also investigate how well a dose-response shape can be approximated if the true shape is not a member of the candidate set. For instance, exponential model is not included in the Candidate Set 3 (see Table 3) but we simulated the data from an exponential model (see Table 5) and compared the closeness between the predicted model and the correct model under the different methods.
Section 3.1 describes the design of the simulation study with the assumptions and the scenarios considered in this article; Section 3.2 describes the performance metrics used to evaluate their performance. Graphical representation of the statistical performance of each method based on the simulation results are shown at the end of the section.

Study design
The study design is based on the design suggested in Bretz et al. (2005). The simulation scenarios are based on a randomized double-blind parallel group trial with a total of patients being allocated to either placebo or one of the four active doses coded as 0.05, 0.20, 0.60, and 1. The methods are evaluated across five sample sizes: = 10, 25, 50, 75, 100 per dose-group. The response variable is assumed to be normally distributed as specified in Section 2.1. A standard deviation of 1.478 is used in the simulation for assessing the power to detect the trend. This standard deviation gives a power of 80% at a sample size of 75 for a pairwise test between two doses. For the purpose of evaluating the model selection and dose-selection performance of the different methods, a response SD of = 0.65 was used. Two different standard deviation were considered to make the simulation study consistent with the original MCP-Mod article (Bretz et al., 2005) where similar assumptions were adopted while simulating the scenarios. For each dose-response shape and sample size combination, shown in the subsequent section, 10,000 simulations are considered.

Data generating shapes
Data are simulated from nine nonlinear models shown in Figure 1 and Table 5. Note that the second emax model, emax2, is not a desirable dose-response model in a typical clinical trial. It is not desirable from a clinical point of view because it does not reach the anticipated efficacy like the other models. However, the methods should identify such extreme situations as well. So it is included in our study. Similarly the truncated logistic model (tlog) and the quadratic model is included to check how the methods are performing for models not included in the candidate set. Note that the first model is monotonic while the second one is not.
All the shapes except the second emax shape (emax2) and the constant shape share the property that the response is 0.2 at the placebo dose = 0 and the maximum response is attained at 0.8.

Performance evaluation of the methods
The main goal of the above methods is to attain PoC and identify the dose-response profile efficiently. To quantify the performance of the different methods in achieving the above, the following performance metrics are analyzed in the simulation study:

Detecting Dose Response Trend (DRT):
The probability of detecting a positive dose response trend ( ) is estimated as the proportion of simulated trials in which the decision rule concludes in favor of dose-response activity. Under the constant model, ( ) gives the type 1 error rate. We always consider the one sided PoC tests at 5% significance level.

Estimating the dose-response profile:
If the PoC is established, two measures are computed to evaluate the accuracy in estimating the dose-response shape: the model selection probability and the cumulative absolute prediction error. The model selection probability is estimated as the proportion of simulated trials in which the correct model is selected after PoC is established. The methods are compared with respect to this model selection probability. If the wrong model has been selected from the candidate set then the method can either fail to provide a good estimate of the dose response shape or can sufficiently well approximate the correct model. Therefore we suggest to investigate the precision of the model shape estimation. As a performance measure we use the cumulative absolute prediction error ( ). It is defined as: where 0 ( , 0 ) denotes the mean response at dose under the actual dose-response model ( 0 ) and ( ,̂) denotes the predicted response at dose based on the dose-response model ( ) selected by the dose-finding method (MCPMod, MCPMod variants, or MHMC method). To make the summary statistic dimensionless and to account for the "oracle" estimation error we consider = where = ∫ 1 0 | ( 0 ( ,̂0) − 0 ( , 0 )) | , that is the absolute cumulative prediction error obtained when the actual model ( 0 ) is fitted to the data. For instance, if we simulate a data from the emax model in Table 5, and apply any dose-finding method (say MCPMod) to this data and obtain linlog as the best fitted model for this dose-response data. Then 0 denote the emax model, denote the linlog model , denote the cumulative absolute deviation of from 0 and denotes the cumulative absolute deviation between the fitted and observed response when we fit the data based on the actual model ( 0 ). The above measure is evaluated for two sample sizes: 10 and 25. We do not present results for the larger sample sizes because the methods are performing quite similar for higher sample sizes in terms of the .

Dose-selection performance:
Once the PoC is established and the model has been selected from the candidate set, the final step of dose-finding studies consists of estimating the target dose(s) of interest. Here, the discussion is restricted to estimation of the Minimum Effective Dose (MED). Following Bretz et al. (2005), we defined the MED as: with the lower 1 − 2 confidence limit for the mean value at dose . A typical choice of is 0.1. Δ is the clinical relevance threshold that is typical suggested by the physicians. Here, we choose Δ = 0.4 (similar to what used in Bretz et al., 2005). To evaluate the dose-selection performance of the different dose-finding methods, the relative deviation ( ) of the estimated MED (̂) from the actual MED is computed in each simulation run ;

= 100̂−
We present the mean, median and interquartile range of the and | | from 10,000 simulation runs under each simulation scenario. These two measures characterize the bias and precision of̂, respectively. The results are shown for sample size = 10, 25, 50, and 75 per dose group and = 0.1 (corresponding to 80% level confidence intervals). It should be noted here that the MCP-Mod and the MCP-Mod variants differ only in the PoC testing step but are very similar with respect to their dose-response profile estimation and dose selection step. In the original MCP-Mod method though parameter values need to be specified in the PoC testing step, they are reestimated after the model selection step. So conditional on the same selected model, the methods should give identical results. Thus differences only arises from the fact that sometimes different models are selected as the best one by the different methods. One can have a clear notion of how often the methods select different models from the model selection table (see Table 10).

Simulation results
The result of the simulation study are summarized here using the performance metrics of Section 3.2.

Detecting Dose Response Trends (DRT):
The power to detect the dose-response trend for the candidate sets 1, 2, and 3, respectively are shown in Tables 6, 7, and 8. These three tables only show the ( ) for data simulated from models not included in the candidate set of MCP-Mod in each case. For data simulated from nonconstant models included in the candidate set, the results are in favor of the MCP-Mod method as expected. They are shown in tables attached in the Supporting Information. An additional table, Table 9 is also added along with the tables to report which method is performing best in dose-response trend detection in each candidate set and under each data generating shape. The constant model is included to assess the performance of the four methods in terms of controlling the Type 1 error. It is observed in Tables 6, 7, and 8 that the Type 1 error is properly controlled for the MCP-Mod, MCLRe, and MHMC but it gets inflated for MCLRa for small sample sizes (see Table 6) and for large candidate sets (see Table 7). Thus FWER control might be an issue for the MCLRa method if sample size is small or if the candidate set is large in a dose-response study. For models included in the candidate set of MCP-Mod, power to detect the trend is better (by 2-4%) for MCP-Mod as compared to MCLRa and MCLRe (see Tables 1, 2, and 3 attached in the supporting information). This is obvious because MCP-Mod is using the exact population parameter estimates in their candidate set. Of course, this situation is expected to be rarely the case in practice. What is interesting to observe here, is that the MHMC method has comparable or better power than the MCP-Mod method, even though the population parameters are unknown in this method. For population shapes not included in the candidate set of MCP-Mod, the performance of MCP-Mod is expected to deteriorate but we observe in Table 9 that it still performs better than the other methods in many situations. For instance, for the linlog data generating shape MCP-Mod can detect the dose response trend with candidate set 3 as good as the other methods even though linlog is not included in its' candidate set (see Table 3). This may be attributed to the fact that the linlog and emax data-generating shape used here (see Figure 1 and Table 5) are close to each other and contrasts corresponding to emax dose response models are able to capture the linlog dose-response trend sufficiently well. Similarly the better performance of MCP-Mod to capture the tlog dose-response trend is attributed to the closeness of the tlog shape with the exponential shape T A B L E 6 Power to detect the dose response trend under constant model and models not included in the candidate set (Candidate Set 1) of that is included in its' candidate set. But when we deviate drastically from the dose-response shapes included in the candidate set of MCP-Mod, like the second emax shape (emax 2), the power drops by 2 − 11% for MCP-Mod as compared to the other methods. The MCLRa method outperforms the other methods across all the candidate sets for this data generating shape. For the quadratic shape, the power to detect the dose-response trend is better for the MCLRa method compared to the other methods. We observed a power loss by 4 − 11% for the MHMC method with respect to the MCLRa method under this data generating shape. Moreover, we observed that we did not gain much in power from the inclusion of the unrestricted model. Another striking observation that is uniform across all the comparisons and all the data generating shapes is that, the MHMC method detects the dose-response trend better for small sample sizes ( = 10).

Estimating the dose-response profile:
The model selection probability tables for candidate set 1 and 2 are attached in the Supporting Information and the model selection probabilities for candidate set 3 is shown in Table 10. Table 10 shows that the MHMC method can identify the linear dose-response shape better than the other methods. But there is no significant difference between the methods for the other data generating shapes. One interesting observation from the model selection probability emax and linlog are included) are used (see model selection probability tables in Supporting Information). This is mainly because the model selection is based on the AIC criteria, so models with less parameters are more preferred if the model shapes are similar.
To assess the methods in scenarios where the "correct" dose response shape is outside the candidate set, it is necessary to quantify the model selection property of the different methods by another parametric quantity. Thus, the measure was introduced in Section 3.2. It is a reasonable measure for assessing the accuracy in estimating the dose-response curve under the different methods. The box plots of of the different methods for candidate set 3 and for two sample sizes (10 and 25) are shown in Figure 2. It shows there is no striking difference between the methods for the emax and sigmoidal Emax shape. However, the performance of the MHMC method is better for dose-response profile estimation under the linear dose-response curve that is also evident from the earlier section. The distribution is comparable for all the methods, with slightly less biased estimates with the MCLRa method. The distribution of for tlog is similar to the distribution of for sigmoidal Emax justifying the frequent misclassification of the tlog models into sigmoidal Emax shape in Table 10. Thus it is evident here that even if the correct model is not included in the candidate set there may be situations when the approximated fitted model performs as good as the correct model. MCLRa and MCLRe performed very similar but the computation time of the MCLRa method is much faster (see Table 4). It performs well for small sizes despite being an asymptotic method so it is preferred over MCLRe method. It is hard to choose between MCLRa and MHMC method in terms of characterizing the dose-response profile.

Dose Selection Performance:
The box plot distribution of MED estimated under the different methods are shown in Figure 3.
The relative bias and precision in the MED estimates (measured in terms of and | |) described in Section 3.2 are shown in Table 11 and Table 6 (in Supporting Information), respectively. Both the table and the plots indicate the dependence of the MED precision on the dose response profile; the exponential, linlog, quadratic, and power shape leads to considerably more biased and less precise MED estimates, for all the methods. Surprisingly, under the tlog shape the MED estimates are accurate even though tlog is not included in the candidate set 3. This might be because the sigEmax shape approximates the tlog shape sufficiently well as shown in Table 10. For dose-response shapes except linear and quadratic shapes, MCLRa method give less biased MED estimates compared to the other methods (see Table 11). MCLRa method estimates the MED with better precision for all the nonlinear dose-response shapes for small sample sizes (see Table 11). However, for large sample sizes the performance of the methods are similar. The better performance of MCLRa method for small sample sizes may be because of the type 1 error inflation. An interesting observation noted is that for nonlinear shapes like power and sigEmax, the MED estimated by fitting the "correct" model is more biased compared (see Table 11) to the MED estimated under the different dose-finding methods. This further supports our argument that predicting the "correct" dose-response shape might not be always necessary. Our goal is achieved if we get a dose response shape that can well approximate the right model.

DISCUSSION
This section summarizes the conclusions from the simulation studies. We have observed from our simulations that though MCP-Mod have more constraints compared to the other methods yet it is performing well in many situations. However, the performance of the MCP-Mod is highly influenced by its choice of candidate set. While the MCP-Mod variant, MHMC method performs uniformly better across all the simulations and under different choice of candidate set for monotone dose-response T A B L E 1 1 Median, mean, and interquartile range (IQR) of the relative deviation ( ) from the target dose estimated under the different methods and different dose-response shapes with candidate set 3. Results are given for the "Actual" case that is when the correct model is fitted on the simulated data and when the model selected under different dose-finding methods are fitted to the simulated data shapes, the MCLRa method shows outstanding performance only when the dose-response shape deviate drastically from the candidate set like the second emax (emax2) model. MCP-Mod failed to perform well here. The disadvantage with the MCLRa method is that the type 1 error gets somewhat inflated for small sample sizes and for considerably large candidate set (with 5 or more models). But for a small candidate set like the candidate set 3, the MCLRa method is showing better performance over the other methods both in dose selection and model selection. The robustness of the MCLRa approach under extreme scenarios like the second emax shape is ascribed to the less constraints in the candidate set selection of this approach compared to the other approach. For the MHMC approach, the restriction of having only nested monotone shapes is very conservative from a practical point of view. But for monotone dose-response shapes the MHMC method perform better than the MCLRa method in dose-response trend detection in most of the simulation scenarios. It is observed from the simulation studies that there is not much discrimination between the MCLRa and MCLRe approach but the MCLRe approach is computationally more elaborative.

MED
In terms of computation time, the MCLRa method is better than the other methods.
It is worth mentioning here that for the simulation studies conducted above, despite the small differences, the methods are highly comparable and for most of the scenarios the model selection probability or the dose-selection power differed only by 3-4%. So it is hard to select or argue in favor of any particular method. It is also important to note that MCP-Mod, MCLRa, and MCLRe differs essentially in the PoC testing step while the model selection and dose selection procedure are similar for all of them. Other model selection criteria like BIC and likelihood ratios are also explored but there is no significant improvement observed over the AIC criterion. Also note that the MHMC method is the only method where candidate models are compared with each other in a multiple testing framework, not just with the constant model.
The benefits of MCP-Mod is that it is already extended to general parametric setting by Pinheiro, Bornkamp, Glimm, and Bretz (2014). The optimal design consideration and determination of appropriate sample size for the MCP-Mod approach is also suggested in Pinheiro et al. (2006a). Using similar procedure suggested in Pinheiro et al. (2006a) and Dette, Bretz, Pepelyshev, and Pinheiro (2008) sample size can be estimated for the new approaches. Power and sample size computation might be a bit challenging for the MCLRa and MHMC approach because of the nonstandard asymptotic results used here. However, Gutjahr and Bornkamp (2017) have shown the computation of rejection probabilities under alternative as well as null distribution for the MCLRe approach. So power computation and sample size estimation at the design stage of the experiment is feasible for this approach. This is of crucial importance for clinical trials.
The MCLRa approach can also be extended to general parametric families and it can be further modified to address heteroscedasticity in the data that is not so obvious in the MCP-Mod approach. The MHMC approach can also be extended to generalized linear model but the results used in this approach needs to be adjusted for non-normal distributed errors. For MCLRe, to incorporate the non-normality in the residuals one needs to extend the results for elliptically contoured distribution which can be obtained following Fang and Zhang (1990) and Gupta, Varga, and Bodnar (2013).
A simple and obvious alternative to the above dose-finding approaches is to conduct linear trend test using contrasts (Tukey, Ciminera, and Heyse, 1985) or to use dose-response modelling under simple a order restriction (Otava et al., 2014). However the LR methodology for detecting linear as well as non-linear trends are more powerful compared to the above approaches and lead to more efficacious modelling under diverse scenarios. Another possible alternative to parametric approach in dose-response studies is using non-parametric approaches. Bornkamp and Ickstadt (2009) proposed a monotone non-parametric regression in a Bayesian framework where they defined the dose-response function by an appropriately scaled mixture of distribution functions. A disadvantage of such a non-parametric approach is that they cannot be applied to clinical data with a small number of dose-groups. Gallant (1977) have also suggested an alternative exact test for testing lack of fit when regularity conditions are violated similar to the PoC testing step of the approaches discussed in this article. But it have been pointed out that this approach is reasonably applicable for non linear models with one parameter but becomes quite awkward in non linear models with more than one parameter. Further, the approaches discussed in this paper could be of interest outside dose-response analysis. Test of trend and subsequently characterizing the correct shape is a major challenge which arises in various domain of applied sciences like economics or genomic studies.

ACKNOWLEDGMENTS
This work is supported by funding from the European Union's Horizon 2020 research and innovation program IDEAS ("Improving Design, Evaluation and Analysis of early drug development Studies"; www.ideas-itn.eu) under the Marie Sklodowska-Curie grant agreement No 633567. We are also grateful to the reviewers for their valuable comments and feedback.

CONFLICTS OF INTEREST
The authors have declared no conflicts of interest.