Designing studies that answer questions

Susan Shott Statistical Communications, PO Box 671, Harvard, IL 60033.

Search for other papers by Susan Shott in
Current site
Google Scholar
PubMed
Close
 PhD

Why It Matters

Veterinary advances are often based on experiments in which new medications, devices, surgical procedures, or other treatments are tested. Because the results of these experiments can dramatically change the way veterinary medicine is practiced, it is essential that experiments be well designed. Bad experimental designs create problems that cannot be corrected by any statistical procedure. Veterinarians who understand the basic principles of experimental design can avoid errors when they design their own studies, and they can detect design errors in research by others.

Experimental design is the structure of an experiment: the experimental procedures, the selection of subjects, and the assignment of subjects to different experimental procedures. Sound experimental designs reduce the risk of confounded effects, which arise when the effects of the treatments are so hopelessly mixed up with other effects that treatment effects cannot be determined.

Suppose a veterinarian has developed a new surgical technique for repairing luxated patellae in dogs. Because she believes her technique is much better than traditional techniques, she wants to ensure that dogs with severe luxations benefit from it. She uses the new technique on dogs with grade 3 or 4 luxations and a traditional technique on dogs with grade 1 or 2 luxations. She then compares the new and traditional techniques with respect to degree of subsequent osteoarthritis and recurrence of luxation after surgery. The design of this experiment guarantees that the effect of the new technique is completely confounded with the effect of severe luxation. Even if the new technique is better, severe luxation may cancel out its increased effectiveness. The new technique may appear no better or even worse than the traditional technique. This experimental design creates a bias against the new technique.

Because this experiment is completely confounded, no statistical method can repair the damage and the experiment is worthless. Even when an experiment is only partly confounded, the results of statistical adjustments are inferior to those of a well-designed study without confounding.

If the conclusions of confounded experiments are accepted, the consequences may be quite harmful. Beneficial treatments may be ignored or useless treatments adopted. Veterinary hospitals may make bad decisions that cost them money. Regulatory agencies may adopt policies that harm the animals they intended to help.

Requirements for Sound Experimental Designs

A sound experimental design includes all of the following safeguards:

  • • Control groups.

  • • Placebos or sham procedures.

  • • Blinded procedures.

  • • Random assignment of subjects to study groups.

  • • Adequate statistical power.

The rationale for each of these safeguards as well as the possible consequences of omitting them will be discussed here.

Control Groups

A control group is a group of subjects that are not exposed to the treatments or procedures being studied. Control subjects may have nothing done to them (except for obtaining of measurements), or they may be assigned to nonexperimental treatments or procedures. A treatment group is a group of subjects that are assigned to undergo the treatments or procedures being studied. The reasoning behind control groups is quite simple: if we cannot compare the treatment group to anything, we cannot determine whether the treatment works.

For example, a study1 included a control group to evaluate the viral clearance effect of a dietary aluminosilicate supplement in pigs infected with porcine circovirus type 2. The pigs in the treatment group received the supplement, whereas the pigs in the control group did not. If a control group had not been available for comparison, there would have been no way to determine whether the supplement had an effect.

A widespread research myth claims that control groups are not needed because before-versus-after comparisons allow subjects to serve as their own controls. Changes after treatment are simply assumed to be caused by the treatment, when in fact, they could be attributable to other factors such as temporary remission or acclimation to being handled. Without a control group, it is impossible to determine whether such changes are due to the treatment or something else. In studies that concern paired body parts, such as eyes or knees, one of each subject's paired body parts can often be assigned to the control group and the other to the treatment group. In this type of study, subjects do serve as their own controls and the statistical tests chosen must account for these intraindividual pairings.

Researchers sometimes try to reduce research costs by omitting control groups. The cost of a control group is small, however, compared with the potential cost of a faulty design. The results of a badly designed experiment may have to be thrown out, making the entire experiment a waste of time, effort, and money as well as a potential source of unjustifiable animal discomfort, distress, or death.

Placebos and Sham Procedures

The control and treatment groups should be as similar as possible. Ideally, they should differ only with respect to the treatment of interest. This allows us to attribute differences in outcomes between the groups to the treatment rather than some other factor. Many experiments make use of placebos or sham procedures to increase the similarity of treatment and control groups. Placebos are inert substances administered in the same way as treatment medications. Sham procedures are surgical or other procedures that closely resemble treatment procedures but are not intended to provide effective treatment.

For example, a placebo was used in a study2 of a stannous fluoride gel used to treat bacterial skin infections in horses. Horses with skin infections received the fluoride gel or a placebo gel. Improved clinical and pruritus scores were detected for the treatment group but not the placebo group. Had a placebo gel not been used, it would have been impossible to determine whether these changes were attributable to fluoride or other components of the gel.

A sham procedure was used in a study3 of gold bead implantation as an analgesic treatment for hip dysplasia. About half of the study dogs had small gold pieces inserted through needles at 5 acupuncture points. The other dogs received a sham treatment in which the skin was penetrated with a needle at 5 nonacupuncture points. Both groups underwent the same anesthesia and hair clipping and received the same needle type. The dogs treated with gold implantation had greater improvements than did dogs that received the sham procedure. Had a sham procedure not been performed, it would not have been possible to determine whether these improvements were caused by gold implantation or other parts of the treatment. Confounding by a placebo effect also could not have been ruled out. A placebo effect can occur when a placebo is used but patients' owners think their animals are getting better because they believe the animals received effective treatment.

Blinded Procedures

A study is double blinded if neither the researchers nor the subjects' owners or caretakers know which subjects are in the control group and which are in the treatment group until the study has ended. Someone who is not directly involved with the study procedures, such as a statistician or pharmacist, has a list showing to which study group each subject belongs. In a single-blinded study, either the researchers know the treatment status and the subjects' owners or caretakers do not, or the subjects' owners or caretakers know the treatment status and the researchers do not. Triple-blinded studies, in which no one knows the subjects' treatment status, are to be avoided.

Blinded studies are also called masked studies. The term masked is coming into more common use because references to blinding subjects have rather unpleasant connotations. To avoid confusion, the term masked is preferred by many ophthalmologists.

Blinded procedures reduce bias. If subjects' owners or caretakers know that placebos are being used, they may exaggerate the severity of clinical signs that they would dismiss if they knew that they were given real medications. They are also quite likely to withdraw their animals from the study if they know they are not getting effective treatment. When a researcher knows a subject's treatment status, the subject's responses and the researcher's interpretation of those responses can be affected, thereby biasing the study results. This sort of bias may be completely unintentional, subconsciously influencing the interpretations of even the most conscientious researcher.

For example, a study4 of Podophyllum sp as a homeopathic treatment for neonatal calf diarrhea involved double-blinded procedures. Calves with naturally occurring diarrhea were treated with a Podophyllum product or a placebo. Neither the researchers nor the farmers knew what each calf was given. The farmers used a 4-point scale to rate calf health each day, with a score of 0 indicating a clinically normal calf and a score of 3 indicating signs of severe depression, anorexia, fever, or watery feces. The possibility that the farmers' ratings were biased by knowledge of the treatment given was eliminated. Such bias could not have been ruled out if blinded procedures had not been used because assessment of calf health was partly subjective.

Blinded procedures cannot always be used. If a study were conducted to compare the effects of surgery with those of chemotherapy, for example, researchers and subjects' owners or caretakers would obviously know which treatment was given. Findings of experiments without blinded procedures must be interpreted with caution because of the risk that knowledge of the study group assignment biased the responses of subjects or researchers.

Randomization

Randomization, or random assignment, is the assignment of subjects to treatment and control groups by use of some chance-governed mechanism. For example, subjects might be randomly assigned by tossing a coin and letting heads dictate assignment to the control group and tails dictate assignment to the treatment group. In practice, the coin-tossing method of randomization is not recommended because coins tend to be rather inconsistent in their tossing properties. A computer program or random number table is generally used to randomize group assignments.

If confounded effects are to be avoided, the control and treatment groups must be similar with respect to any characteristic that could affect the results (eg, age, sex, and breed). It might seem that the best way to make groups similar is to deliberately match them with respect to all of the characteristics that seem important. A researcher could try to assign subjects so that all groups have similar breed and sex compositions, similar age distributions, and so forth. When there are many characteristics to distribute evenly between groups, however, it is quite difficult to do so. In addition, researchers rarely know all of the characteristics that could influence the results. Characteristics that are not taken into account when subjects are assigned to groups may turn out to be important, producing confounded effects.

Yet another problem is selection bias, which is bias in the assignment of subjects to treatment and control groups. Selection bias is almost inevitable when the researcher decides which subjects are assigned to treatment and control groups, and it need not be conscious to have an effect. Suppose horses in a treatment group will undergo frequent and intensive exercise sessions. The researcher assigning horses to treatment and control groups may, out of sympathy, assign to the treatment group only horses that are in good physical condition. If this is done, the effect of the treatment will be confounded with the effect of good physical condition. This creates a bias in favor of the treatment.

Such problems can often be avoided by the random assignment of subjects. Selection bias is eliminated by randomization because the researcher does not decide which subjects go into treatment groups and which go into control groups. Practical difficulties are reduced because randomization is easy to carry out. When large enough groups are used (ideally, at least 100 subjects/group for a 2-group experiment5), the treatment and control groups are likely to be similar.

When small groups are used, randomization can produce dissimilar groups. The risk of getting dissimilar groups increases as the group size decreases. Researchers should always check to make sure that their groups are similar with respect to characteristics that might be important. Even when the groups are small, however, randomization is essential to eliminate human bias when assigning subjects to groups.

When a researcher knows before conducting a study that certain characteristics will influence the results, stratified randomization can be used to make the study groups similar with respect to those characteristics. Stratified randomization is carried out by separating subjects into subgroups (strata) that are defined by important characteristics, then randomly assigning subjects in each subgroup to a study group. For example, suppose we know that sex will affect the results of a study that concerns neutered ferrets. We can obtain groups with identical sex distributions if we randomly assign females to groups, and randomly assign males to groups.

Statistical Power

Even a randomized, controlled, double-blinded experiment is likely to fail when the sample size is inadequate. When the sample is too small, there is little chance of finding a difference or relationship. Statistical power is the chance of detecting a difference of a certain size or a relationship if one exists. The minimum scientifically acceptable power is 80%. This power is rather low; it means that the chance of failing to detect a difference or relationship is 20%. Researchers settle for this low power because extremely large sample sizes are usually required to obtain high power values such as 90% or 95%.

Before a study is conducted, a statistical power analysis should be carried out to determine the sample size needed to ensure adequate power. A power analysis can be obtained by consulting a statistician or using power analysis software. It may be tempting to skip the power analysis and just scavenge up the subjects that can be easily enrolled in a study. There is little point, however, in expending time, money, and effort on a study that has only a small chance of finding a difference or relationship.

Power is important for any study, but it is particularly critical for studies conducted to show that 2 or more treatments or diagnostic methods are equivalent. We cannot claim that 2 treatments are equivalent just because we failed to find a difference, even with the minimum acceptable power of 80%. A 20% miss rate is so high that claims of equivalence are not justified. Equivalence studies usually require much higher power and much larger sample sizes than do studies designed to find differences.

Observational Studies

Many studies are observational rather than experimental. In an experimental study, the researcher assigns subjects to the groups studied by means of randomization or some other method. In an observational study, the researcher does not assign subjects to the groups studied. For example, studies conducted to compare certain characteristics among various sheep breeds are necessarily observational. Researchers obviously cannot assign sheep to different breeds. Instead, they obtain and compare samples of sheep from the breeds of interest. Similarly, a study in which female and male veterinarians are compared with respect to their attitudes toward mandatory spay-neuter laws is necessarily observational.

Many observational studies have yielded invaluable information, and many important questions cannot be answered in any other way. However, observational studies always have a high risk of confounded effects. For example, suppose people with high incomes are more likely than people with low incomes to have pet geckos. This difference may be due to some factor other than income. Because people cannot be randomly assigned to receive high incomes (except for lottery winners) or low incomes, confounded effects are a major concern.

An observational study6 involved investigation of the impact of a trap-neuter-return adoption program on a free-roaming cat population. During an 11-year period, free-roaming cats at a university campus were trapped, neutered, and returned to the campus or adopted. At the end of the study, the cat population had decreased by 66%. Although the program was the most likely reason for this reduction, other possibilities (eg, illness) cannot be ruled out.

Observational studies must always be closely examined for confounding. Even if no obvious confounding is evident, the results of such studies must be interpreted with caution because confounded effects may result from characteristics that were not measured. Whenever possible, the results of observational studies should be confirmed or disconfirmed through conduction of well-designed experimental studies.

For example, experimental confirmation was sought for results of observational studies on mare reproductive loss syndrome.7,8 In 2001, approximately 25% of pregnant mares in the Ohio River Valley of the United States aborted within several weeks. Observational studies revealed a strong association between this syndrome and the presence of eastern tent caterpillars (ETCs). Results of subsequent experiments, in which researchers fed ETCs to pregnant mares, confirmed that eating ETCs caused mares to abort. Further experiments, in which mares were fed various ETC parts, found that the barbed setae (hairs) on ETCs cause abortions.

Observational studies can be retrospective or prospective. In a retrospective study, medical records or other data sources are examined to collect data on subjects that have already experienced the events and outcomes of interest. In a prospective study, subjects have not experienced all of the events and outcomes of interest and are followed over time to collect data on these events and outcomes as they occur.

Cross-sectional studies, case-control studies, and cohort studies are common types of observational studies. In a cross-sectional study, a sample of subjects is examined at a single point in time. For example, a cross-sectional study9 of osteoarthritis in cats involved a sample of 100 client-owned cats to examine the prevalence and characteristics of osteoarthritis in cats as well as the association of osteoarthritis with behavioral changes. In a case-control study, subjects with a disease or injury (the cases) are compared with subjects without the disease or injury (the controls) to identify possible risk factors for that disease or injury. For example, a case-control study10 of risk factors for scrapie in sheep involved comparison of 61 sheep flocks with at least 1 scrapie case with 61 flocks that had no history of scrapie. In a cohort study, the investigator obtains a sample of subjects that share some characteristic of interest and examines their outcomes over time. For example, a cohort study11 of cancer in Flat-Coated Retrievers was conducted to follow a cohort of 174 dogs for 11 years to investigate their survival rates and causes of death.

Consequences of Design Flaws

When data are analyzed improperly, they can always be reanalyzed correctly (preferably before publication). When data are generated by a study with fundamental design flaws, nothing can be done to correct the damage resulting from confounded effects or an inadequate sample size. The only remedy is to repeat the experiment, using a sound design. This unpleasant fact causes considerable consternation among researchers who hope that their badly designed studies can be patched up statistically. It simply cannot be done. Far too often, a data set is well beyond statistical repair and winds up dead on arrival at the statistician's office.

The best strategy is prevention: avoid confounded studies whenever possible. Researchers should consult a statistician about their studies before the study begins, not after the data have been collected. Design flaws can then be corrected before they sink the entire study. It is not pleasant to endure all the work of completing a study, only to learn afterward that the power is insufficient and the design is completely confounded. Consulting a statistician before the study begins will help ensure a successful study that answers the questions of interest.

References

  • 1.↑

    Jung BGToan NTCho SJ, et al. Dietary aluminosilicate supplement enhances immune activity in mice and reinforces clearance of porcine circovirus type 2 in experimentally infected pigs. Vet Microbiol [published online ahead of print Dec 18, 2009] 10.1016/j.vetmic.2009.11.009.

    • Search Google Scholar
    • Export Citation
  • 2.↑

    Marsella RAkucewich L. Investigation on the clinical efficacy and tolerability of a 0.4% topical stannous fluoride preparation (MedEquine Gel) for the treatment of bacterial skin infections in horses: a prospective, randomized, double-blinded, placebo-controlled clinical trial. Vet Dermatol 2007; 18:444–450.

    • Search Google Scholar
    • Export Citation
  • 3.↑

    Jaeger GTLarsen SSøli N, et al. Double-blind, placebo-controlled trial of the pain-relieving effects of the implantation of gold beads into dogs with hip dysplasia. Vet Rec 2006; 27:722–726.

    • Search Google Scholar
    • Export Citation
  • 4.↑

    de Verdier KOhagen PAlenius S. No effect of a homeopathic preparation on neonatal calf diarrhoea in a randomised double-blind, placebo-controlled clinical trial. Acta Vet Scand 2003; 44:97–101.

    • Search Google Scholar
    • Export Citation
  • 5.↑

    Gore SMAltman DG. Statistics in practice. London: British Medical Association, 1982.

  • 6.↑

    Levy JKGale DWGale LA. Evaluation of the effect of a long-term trap-neuter-return and adoption program on a free-roaming cat population. J Am Vet Med Assoc 2003; 222:42–46.

    • Search Google Scholar
    • Export Citation
  • 7.

    McDowell KJWebb BAWilliams NM, et al. Invited review: the role of caterpillars in mare reproductive loss syndrome: a model for environmental causes of abortion. J Anim Sci 2010; 88:1379–1387.

    • Search Google Scholar
    • Export Citation
  • 8.

    Tobin T. The 2001 Kentucky equine abortion storm: the caterpillar/setal hypothesis of the mare reproductive loss syndrome (MRLS). Available at: www.thomastobin.com/mrlstox.htm Accessed Mar 1, 2010.

    • Search Google Scholar
    • Export Citation
  • 9.↑

    Slingerland LIHazewinkel HAMeij BP, et al. Cross-sectional study of the prevalence and clinical features of osteoarthritis in 100 cats. Vet J [published online ahead of print Jan 16, 2010] 10.1016/j.tvjl.2009.12.014.

    • Search Google Scholar
    • Export Citation
  • 10.↑

    Healy AMHannon DMorgan KL, et al. A paired case-control study of risk factors for scrapie in Irish sheep flocks. Prev Vet Med 2004; 64:73–83.

    • Search Google Scholar
    • Export Citation
  • 11.↑

    Dobson JHoather TMcKinley TJ, et al. Mortality in a cohort of flat-coated retrievers in the UK. Vet Comp Oncol 2009; 7:115–121.

    • Search Google Scholar
    • Export Citation
All Time Past Year Past 30 Days
Abstract Views 99 0 0
Full Text Views 1294 1128 126
PDF Downloads 203 119 8
Advertisement