Abstract
A thoughtful, clearly defined research question should be the foundation of any clinical trial or research study. The research question helps determine key study methods, and defining a specific research question helps avoid problems with inadequate sample size, inappropriate design, or multiple statistical comparisons. Rationales and strategies for formulating research questions and using them to define study protocols are discussed, with a focus on application in clinical trials.
Veterinary clinical trials are experiments that test medical treatments in client-owned animals with naturally occurring health conditions. As veterinarians become increasingly familiar with the principles of evidence-based medicine, they are looking to the results of clinical trials to guide their practice. However, the methodological quality of published veterinary clinical trials is often poor,1–4 which means that readers could draw incorrect conclusions from trial results and in turn make unfounded changes in patient care. Results of a recent review of study power in reports of small animal trials indicated that not only did most trials contain methodological flaws, but only a handful of studies even stated the primary research question.4 Whereas this may seem impossible (after all, how can an experiment have no question?), it is actually a well-recognized problem in clinical research: the general idea is clear, but the specific study question is inadequately defined.5 In trying to address large knowledge gaps with minimal resources, veterinary investigators often pose a broad research objective, design a study that will maximize the number and types of comparisons that can be made, comb the data to find interesting associations, and then devise explanations as to what these findings mean. Although well intentioned, trials that utilize this exploratory data-driven approach can result in biased or misleading results and are typically improperly designed to answer the questions of greatest clinical relevance. Of particular concern is the potential to arrive at false conclusions because of inappropriate study design, inadequate sample size, multiple statistical comparisons, or a combination of these issues. These are serious problems that affect the validity of clinical trials and other common research designs in veterinary medicine, such as retrospective case series and cohort studies as well as preclinical studies (eg, those involving purpose-bred animals or cadavers).
Development of a primary research question is essential to producing clinically relevant study results that can be applied to evidence-based practice.5–8 A well-defined and specific research question helps determine the appropriate study design, sample population, and methods of data collection and analysis necessary to meaningfully address the knowledge gaps of greatest importance.8 In this article, I will review how to define a research question, explain how the research question determines key methodological elements of a study, and discuss common pitfalls of clinical studies that fail to clearly define a primary research question. Although the discussion will focus on clinical trials, the principles can and should also be applied to other types of study designs commonly encountered in veterinary medicine, including retrospective and preclinical studies. It is hoped that this review and discussion will encourage veterinary investigators in all disciplines to successfully apply these principles and to incorporate a clearly defined primary research question when designing and conducting their studies.
Defining the Research Question
The first step in formulating a clinical research question is to gather information about the perceived knowledge deficit of interest. Typically, this involves a systematic search of the published veterinary and human medical literature. Currently, true systematic reviews and meta-analyses9 are rare in the veterinary literature; therefore, researchers should review primary sources (such as original peer-reviewed articles) to understand what has been studied and reported on the topic to date. Enlisting the aid of a research librarian can help ensure that important citations are not missed during the information-gathering phase. Valuable information can also be gained via meetings with subject matter experts and focus groups with animal owners. The perspective of these individuals may be instrumental in determining the clinical relevance of proposed study questions and outcomes.
Many questions about a subject area will likely be generated during the aforementioned process. A single trial usually cannot address multiple research questions without introducing undue complexity of design and analysis.6 Therefore, although investigators would typically like to answer many questions, it is advisable to establish a single primary research question on which the trial will focus.5 One of the challenges in developing an appropriate research question is determining not only which clinical uncertainties should be studied, but which can be studied given the available resources. Initial ideas must be narrowed into questions that can be adequately answered or tested. The primary study question should be the answerable question that is of greatest interest to investigators and other relevant stakeholders, such as veterinarians and animal owners.5,6,10 It is the question around which the study is primarily designed and which is emphasized in reporting of study results. The primary question is formulated into a specific hypothesis stating what the investigators expect to find; in a clinical trial, the hypothesis is typically how the primary outcome in an experimental group is expected to compare with the outcome in a control group. The hypothesis is tested in the trial by analyzing the aggregated primary outcomes of subjects in each group.11
Secondary questions relating to the primary study question can also be formulated. There are 2 main types of secondary study questions in a clinical trial: outcome questions and subgroup questions.5 Secondary outcome questions ask whether trial groups differ with respect to response variables other than the outcome assessed for the primary study question. For example, if the primary study question relates to whether a special renal diet, compared with a standard diet, improves overall survival time in cats with chronic kidney disease, secondary outcomes might include changes in creatinine concentration, body weight, or incidence of hypertension. Secondary subgroup questions examine whether subsets of subjects with certain characteristics differ with respect to the effect of the intervention on primary and secondary outcomes. Continuing with the above example, a subgroup question might evaluate the effect of the 2 different diets on overall survival time among cohorts of cats with early stage disease. Subgroup questions should be designed to evaluate clinically plausible effects of the intervention on subjects with different baseline characteristics.12
Primary and secondary research questions and their associated hypotheses should be written as specifically as possible. It is not sufficient to ask: “is X better than Y?” or to hypothesize that “outcome with A will be superior to B” because such vague language can be interpreted in a multitude of ways. Methodologists recommend the use of a structured approach to develop a well-defined research question.11 An optimal research question for a clinical trial will define the population being studied, the experimental and control interventions, the specific details of administration, and the specific primary outcome, including its time frame.13 These elements are commonly referred to as the PICOT (population, intervention, comparator, outcome, and time-frame) format (Appendix 1). Published studies that define the primary research question with the PICOT format generally have higher methodological quality and clarity of reporting, compared with studies that lack a structured research question.14
It is essential that all primary and secondary research questions be completely formulated prior to planning or implementing a study. The questions should drive the data collection—not the other way around. The purpose of a clinical study is to answer well-formulated and clinically relevant questions, not to seek out statistical relationships between variables and retroactively rationalize their meanings. Carefully selecting and defining the research question at the beginning of a trial reminds us why we undertook the research in the first place. It also provides a safeguard against formulating questions simply to fit the results or changing course midstudy in response to interesting or unexpected results.15 An unambiguous research question provides a blueprint for an efficient trial that achieves a specific goal.
Determining Key Elements of Protocol Design
With the research question clearly defined, designing the study protocol becomes a relatively streamlined process. The parameters delineated in the research question dictate the design elements required to obtain a meaningful answer.5,6,11,16 To illustrate this principle, I will focus on 3 important elements of trial design: the comparator group, outcomes, and sample size. However, the ideas presented are also applicable to other elements of study design.
Most clinical trials are comparative in nature and require a control group against which the effects of the experimental treatment will be assessed. The optimal comparison will differ depending on the research objectives. Many choices must be made, including whether the control treatment is an active treatment, an inactive (placebo) treatment, or both; whether the purpose of the comparison is to test the superiority or non-inferiority of the experimental treatment relative to the control treatment; and how subjects will be allocated to experimental groups. The chosen comparison will in turn influence the inclusion criteria, blinding, analytic plan, and so on. A structured research question will help direct the appropriate choices for each of these study design elements, whereas confusion ensues when the research question is not clearly specified. For example, consider the following research question:
In adult dogs weighing > 15 kg (33 lb), does treatment with carprofen before laparoscopic ovariectomy improve postoperative outcome?
In this instance, the best control group and other methodological choices are necessarily guesstimates because the comparison and outcomes of interest are unspecified. (What is the outcome? What is being compared? How much improvement is relevant?). Protocol decisions must be made to proceed with the study, but without a clear study question, it is easy to end up in a situation where arbitrary decisions about trial design dictate what questions can be answered, rather than the other way around. This occurs because the same clinical objective can often be met through various comparisons, quantified in multiple ways, or described by multiple outcomes.11 Each potential control comparison and definition of improved outcome might require different types or numbers of measurements, blinding of different study personnel, and use of different sample sizes and analytic plans. Thus, vague research questions can easily result in excessive, clinically irrelevant, or noninformative comparisons and outcome assessments.8 Investigators who set out to explore differences between treatments may easily become distracted by statistical differences in peripheral outcomes while failing to realize that the study is not properly designed to answer questions of real clinical and scientific importance. As such, it is essential that there be a clinical or scientific rationale for each trial methodological element as guided by the prespecified primary and secondary research questions. When the study question is poorly formulated, the overall trial structure, comparator arm, outcome assessments, and analysis plan may all be suboptimal in the ability to provide clinically relevant data. An even more problematic issue is that statistical differences identified by evaluating many different associations may be entirely spurious.
When the research question is clearly specified, these issues are more easily avoided. Consider this revised research question:
In adult dogs weighing > 15 kg that undergo laparoscopic ovariectomy, does SC injection of carprofen (2.2 mg/kg [1 mg/lb]) versus an equivalent volume of saline (0.9% NaCl) solution at induction of general anesthesia result in greater activity monitor counts over the 24 hours immediately following surgery, controlling for age and baseline activity?
Now, the control group is specified, the purpose of the comparison is to determine superiority, and the enrolled dogs should be randomized, as there is no rationale for a historical or nonrandom comparison. Other study design elements are also easy to determine, such as the inclusion criteria, method, and timing of outcome measures. Furthermore, the statistical analysis plan can be determined at the outset because the associations of interest are already defined, thereby reducing the likelihood of false-positive associations as a result of excessive statistical comparisons.
Estimating the Appropriate Sample Size
The primary research question should also dictate the sample size of a clinical trial. Although not often reported in veterinary studies, power and sample size calculations are valuable in studies other than randomized trials, such as to determine the required number of experimental units in a preclinical study or the size of a cohort necessary to make a retrospective comparison. By use of the expected value of the primary outcome in the control group, one can estimate the number of animals needed to have a reasonably high power (a common target is at least 80%) of detecting the smallest relevant outcome difference between groups, if it exists. The difference between groups that you would like to detect should not only be clinically relevant but should also represent a plausible effect of the intervention.10 For example, whereas there is no doubt that 100% improvement in survival time among dogs getting one chemotherapy agent versus another would be clinically relevant, in most instances it would unreasonable to expect such an effect (and probably unnecessary to perform a comparative trial to identify it). In a retrospective cohort study, a sample size calculation determines the number of animals that must be studied to make relevant comparisons, which can in turn dictate the databases and time frames from which data are collected. Similarly, the number of cadaver limb constructs needed for a particular preclinical study will depend on the outcome of interest and the magnitude of difference considered relevant. In the laparoscopic ovariectomy study example discussed, we might have prior evidence that the minimum clinically relevant difference in postoperative activity monitor counts between treatment groups is 25% (that is, differences of < 25% between groups do not correspond to any measurable benefit to the dogs). Therefore, to have a high probability of statistically documenting this effect size if it exists, we would calculate the number of dogs needed in each group to have a high power (≥ 80%) to detect a difference of ≥ 25%. If the subsequent study indicated no statistically significant difference in activity counts between groups, we could be reasonably confident that a meaningful improvement in activity counts was not missed.
A trial of a specific sample size cannot necessarily answer multiple related questions with the same precision. For example, the number of animals required to demonstrate non-inferiority is usually substantially larger than that required to establish superiority.17 Similarly, a study with high power to detect a certain difference in the primary outcome of interest will not necessarily be able to identify important differences in secondary outcomes or across subgroups. However, planning the study sample size around the primary research question helps ensure that at least 1 key clinical question can be answered by the trial. On the other hand, it is virtually impossible to estimate the sample size required to answer a question that has not been posed.
The necessary sample size should be considered during the early stages of study design. Underpowered studies are a major problem in veterinary medicine.4 They result in inefficient use of time, money, and patients; misinterpretation of results; and inappropriate changes in patient care. Many well-formulated and clinically relevant research questions cannot feasibly be tested in a clinical trial because the sample size required to answer them is unrealistically large. Constraints on sample size in veterinary medicine are particularly severe because of lack of funding and the coordinated multicenter infrastructure required to complete large trials. The median sample size of veterinary clinical trials published between 2000 and 2012 was only 32 to 40 animals.2,4 If a sample size calculation indicates that an unworkable (ie, very large) number of animals is needed, the primary study question should be revised. This involves repeating the steps involved in generating the original study question and identifying a different clinically relevant question that can be answered with the available resources. The original question can often still be included as a secondary hypothesis-generating outcome. This approach will help prevent the situation in which a trial has virtually no chance from the outset to meaningfully inform clinical practice.
Using an Appropriate Analytic Plan
In addition to promoting the development of efficient study design and appropriate sample size, a well-defined research question provides the framework for an applicable data analysis plan. As for other elements of study design, the primary analytic methods should be stated in advance and based on the primary research question. In the absence of a clear statistical plan, misleading results are likely.12,15 To illustrate this issue, consider how a poorly designed research question can contribute to multiplicities that threaten the validity of statistical conclusions and their clinical application.
Multiplicities refer to multiple comparisons and inferences that investigators make from the same data, and commonly arise in clinical trials and other veterinary research designs. Multiplicities substantially inflate the probability of making a type I error, or false-positive conclusion. Multiple comparisons arise as a result of comparing multiple outcomes and treatment arms, testing repeated measurements over time, taking multiple looks at the data as it is collected, and analyzing the data according to various subgroups of interest (Appendix 2).18 Although there is inherent multiplicity in many study designs, studies with inadequately defined research questions are particularly likely to include several types of multiple comparisons. If there is no primary research question to guide the analysis, investigators are likely to use a data-driven approach whereby a large amount of data is collected and subsequently mined for statistically significant outcomes, with many permutations of treatment arms, time points, and subgroups. Common problematic (hypothetical) examples in veterinary medicine include the following: 10 horses randomized to 2 anesthetic protocols where 20 different physiologic and blood parameters are collected and compared at 10 different time points; and a retrospective study of 40 cats with intestinal cancer where each CBC and serum biochemical test result is tested for its association with survival time across all animals and then tested again across several smaller subgroups of cats categorized according to intestinal location, lymph node status, and cancer type. Even if the 2 equine anesthetic protocols were in fact identical or if none of the cats' blood parameters were associated with survival time, we would still expect a number of statistically significant findings in these studies on the basis of chance alone. Thus, with enough testing, false-positive results will inevitably occur, but it is not possible to determine which are real and which are not.19 Researchers naturally concentrate on associations for which results appear most favorable, which in turn requires the generation of explanatory theories to justify the observed findings. When false-positive results occur, incorrect information is propagated and this can then derail appropriate patient care and research direction.
Problems of multiplicity are compounded when the study power is low and the associations being tested lack a scientific or clinical basis.20 When a test has low power and there is truly no difference between groups for most of the comparisons being made, more than half of all results that are significant at the P = 0.05 level can be false-positive findings.21 This problem may be further exacerbated when tests are correlated rather than independent.22 The potential for false-positive results is often obscured because published manuscripts generally only report tests that achieved statistical significance, rather than indicate every test that was performed or explored.3,12
A clearly defined primary research question can help mitigate the risk of excessively high rates of false-positive results. An efficient study design formulated on the basis of specific, relevant hypotheses is one of the central protections against multiplicity21 because it tends to limit the number of subgroups and other sources of multiple comparisons.12,15,20 Nevertheless, it is nearly impossible to design a clinical trial or other veterinary study without introducing 1 or more sources of multiplicity. Because of this, experts generally agree that statistical adjustments are necessary in confirmatory trials to reduce the odds of finding false-positive results.12,15,20–23 Many adjustment procedures exist that can be applied to different data situations.23 When the trial objectives are clearly defined at the outset, a valid analytic strategy can be planned ahead of time on the basis of the pathophysiology of the disease being studied and the interventions of interest.20 If data-driven hypotheses are tested, whether as secondary outcomes in a clinical trial or as part of an exploratory retrospective design, these should be clearly indicated and considered hypothesis-generating; the P values for such tests typically do not predict what could occur if the hypotheses were tested separately in another trial.
Conclusion
There are several compelling reasons to define a primary research question prior to planning or implementing a veterinary clinical research study. In this article, I have provided an overview of the main reasons to develop a specific research question and highlighted some of the negative consequences that can befall trials that fail to do so. The goals of designing a trial or other study are 2-fold: first, we want a trial that is capable of answering at least 1 important clinical question, and second, we wish to limit the influence of factors that might cause us to arrive at an incorrect or incomplete answer. Developing a clearly defined research question is an important first step toward achieving both of these goals.
Acknowledgments
The author thanks Dr. Susan Ellenberg for expert review and commentary.
References
1. Lund EM, James KM, Neaton JD. Veterinary randomized clinical trial reporting: a review of the small animal literature. J Vet Intern Med 1998; 12: 57–60.
2. Brown DC. Control of selection bias in parallel-group controlled clinical trials in dogs and cats: 97 trials (2000–2005). J Am Vet Med Assoc 2006; 229: 990–993.
3. Sargeant JM, Thompson A, Valcour J, et al. Quality of reporting of clinical trials of dogs and cats and associations with treatment effects. J Vet Intern Med 2010; 24: 44–50.
4. Giuffrida MA. Type II error and statistical power in reports of small animal clinical trials. J Am Vet Med Assoc 2014; 244: 1075–1080.
5. Friedman LM, Furberg CD, DeMets DL. Fundamentals of clinical trials. 4th ed. New York: Springer, 2010.
6. Haynes RB. Forming research questions. J Clin Epidemiol 2006; 59: 881–886.
7. Farrugia R, Petrisor BA, Bhandari M. Research questions, hypotheses, and objectives. Can J Surg 2010; 53: 278–281.
8. Thabane L, Thomas T, Ye C, et al. Posing the research question: not so simple. Can J Anaesth 2009; 56: 71–79.
9. Haase SC. Systematic reviews and meta-analysis. Plast Reconstr Surg 2011; 127: 955–966.
10. Schulz KF, Altman DG, Moher D. CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. BMJ 2010; 340: c332.
11. Piantadosi S. Clinical trials: a methodologic perspective. 2nd ed. Hoboken, NJ: Wiley, 2005.
12. Yusuf S, Wittes J, Probstfield J, et al. Analysis and interpretation of treatment effects of subgroups of patients in randomized clinical trials. JAMA 1991; 266: 93–98.
13. Rios LP, Ye C, Thabane L. Association between framing of the research question using the PICOT format and reporting quality of randomized controlled trials. BMC Med Res Methodol 2010; 10: 11.
14. Mayo NE, Asano N, Barbic SP. When is a research question not a research question? J Rehabil Med 2013; 45: 513–518.
15. Tukey JW. Some thoughts on clinical trials, especially problems of multiplicity. Science 1977; 198: 679–684.
16. Sackett DL, Wennberg JE. Choosing the best research design for each question. BMJ 1997; 315: 1636.
17. Jones B, Jarvis P, Lewis JA, et al. Trials to assess equivalence: the importance of rigorous methods. BMJ 1996; 313: 36–39.
18. Proschan MA, Waclawiw MA. Practical guidelines for multiplicity adjustment in clinical trials. J Clin Epidemiol 2000; 21: 527–539.
19. Westfall PH, Young SS. Resampling-based multiple testing: examples and methods for p value adjustment. New York: Wiley, 1993.
20. Bender R, Lange S. Adjusting for multiple testing—when and how? J Clin Epidemiol 2001; 54: 343–349.
21. Sterne JAC, Smith GD. Sifting the evidence—what's wrong with significance tests? BMJ 2001; 322: 226–231.
22. Bauer P, Chi G, Geller N, et al. Industry, government, and academic panel discussion on multiple comparisons in a “real” phase three clinical trial. J Biopharm Stat 2003; 13: 691–701.
23. Sankoh AJ, Huque MF, Dubey SD. Some comments on frequently used multiple endpoint adjustment methods in clinical trials. Stat Med 1997; 16: 2529–2542.
Appendix 1
A structured approach to defining the research question in a clinical study: the PICOT criteria.
In ______P______, how does ______I______ compared with ______C______affect ______O______ over ______T______?
Population: the target group to which the study is most relevant and to which results are intended to generalize.
Intervention: the specific condition, behavior, test, medication, procedure, etc being studied. For experimental studies, this will be a direct treatment or action to which study subjects are assigned by investigators; for observational studies, this will be a characteristic or intervention that exists in or is applied to study subjects independent of the research study.
Comparator: the alternative to the intervention being studied. This might be the absence of the intervention or a placebo or active control. Experimental studies always have a comparison, whereas some observational studies do not.
Outcome: the parameter that will be measured to determine the effect of the intervention on the population.
Time: the time frame over which the outcome will be measured.
Appendix 2
Common sources of multiple comparisons that can be encountered when formulating data analysis plans for clinical studies.
• Testing an excessive number of explanatory variables
• Testing an excessive number of outcome variables
• Testing different definitions of the same outcome
• Repeating tests across multiple subgroups of subjects
• Testing the same variable in multiple formats
• Repeating the same comparisons with different statistical tests
• Defining groups or levels of variables on the basis of the data
• Changing criteria used to include or exclude subjects from analysis
• Repeating tests both within and between treatment groups
• Adding subjects or experiments and repeating analyses until statistical significance occurs