Outcome-based veterinary medicine currently lacks reliable validated outcome measures.1–3 A double-blind, controlled study is seldom used to evaluate outcomes of orthopedic surgery in dogs. On the assumption that chronic pain and lameness are not equivalent in terms of a certain type of behavior but underlie the behavior,4 chronic pain questionnaires designed to be used by dog owners have been evaluated.5–8 A simple multifactorial descriptive pain questionnaire in Finnish, leading to an index by summing up scores for 11 questions (ie, items) that were easily applicable to all kinds of dogs, owners, and environments and that all were significantly different in dogs with chronic signs of pain caused by osteoarthritis, compared with healthy nonpainful control dogs, was evaluated by us at the University of Helsinki.5 We propose to call the resulting index the HCPI.
Before use, it must be determined that an index is valid, reliable, and sensitive.9 Many tests exist to determine these factors, and no single test can unequivocally prove the worth of an index, but together, test results can strengthen an index.9 Methods are chosen partly on the basis of how data are gathered and the preferences of the researchers. Part of the validity testing for our index was performed in an earlier study.5
Validity is the quality of a scale or index, meaning its ability to measure what it is supposed to measure.9,10 Face validity4,10,11 is the extent to which the scale or index after it is constructed is subjectively viewed by knowledgeable individuals (eg, veterinarians) as covering the concept, such that each item in the questionnaire measures chronic pain in some way. Content validity4,9–11 is related to face validity, being based on logic and expertise. It asks whether the scale or index covers all of the generally accepted variables of, for example, chronic pain (ie, is it sufficiently comprehensive?). To achieve face and content validity, researchers find the best items that assess chronic pain. This involves a long pretrial process and, in the case of our index, started a year before the final questionnaire was administered.5 Question topics for the first items were gathered from the clinical experience of the authors, previous research, literature, and informal interviews with owners and colleagues. Questions were tested several times until all ambiguous or poorly worded questions had been deleted or rewritten and again retested. We finally had 25 items that were tested in a clinical setting.5 From these, 14 items that either were not applicable to all owners (eg, stair climbing), were not easily understood (eg, pacing), or did not reveal a significant difference between healthy and diseased dogs (eg, appetite) were dropped, resulting in an 11-item index.5 Criterion (also called predictive or concurrent) validity4,9–11 is used when describing the correlation between a scale and another external measurement of the same phenomenon. In the study reported here, criterion validity was assessed against a QOL question7,8 and a mobility VAS.12,13 Construct validity4,9–11 by extreme groups,10 in which dogs with pain caused by osteoarthritis were compared with healthy dogs with no pain, was used in our previous study.5 In the study reported here, construct validity was assessed by use of PCA.4,14 The primary application of this technique in scale development is to reduce the number of items and to detect a structure in the relationship between items (ie, to determine how many latent constructs underlie a set of items).4 An important aspect of the construct extraction is that the solution has to be interpretable. Therefore, several solutions are usually explored and the one that makes the best sense is chosen. Usually, the construct extraction is done at a single time point, but if several similar evaluations are at the disposal of researchers, it is possible to check the stability of the construct solution by rechecking it at various time points.8,10,14,15
Reliability refers to the extent to which the measure yields the same score each time it is administered, all other things being equal.9,10,14 When designing a questionnaire, its optimal length is estimated on the basis of how the questionnaire will be used. Kline15 has recommended 10 items as the minimum for a reliable test. A longer questionnaire is usually more reliable, whereas a shorter questionnaire places less burden on the responder.4,9 We were aiming for a 10- to 15-item scale with the trade-off of somewhat lower reliability. Several types of reliability10,15 exist, and the reliability of a scale or an index is often measured by use of different techniques that will give somewhat different values.9 Internal consistency or equivalence9 is the first reliability test to perform and estimates how well items that reflect the same construct yield similar results.14 Internal consistency is assessed through the overall correlation among items in the same construct; the Cronbach A value16 is the best-known statistic for this determination. Repeatability (also called stability, test-retest reliability, temporal reliability, or intraobserver reliability) is when a test is given twice to the same cohort and thereby evaluated by the test-retest method.4,9,10 When the measure is taken over time intervals, all other things being equal, scores of the owners should remain consistent. This is often tested by use of intraclass correlation.10,11,17 The correlation is a function of time; 1 month between 2 evaluations will give a value lower than if there would only have been a day or a week between evaluations.
Sensitivity to change or responsiveness10 of the scale reflects the capability of the instrument to measure changes in degrees of pain over time in response to clinical interventions. As intervention, 1 group receives an analgesic treatment (eg, NSAID for osteoarthritis) and is compared with a group that receives a placebo. The analgesic is presumed to affect the degree of pain more than will the placebo.
The purpose of the study reported here was to validate our previous questionnaire and confirm that psychometric properties of the questionnaire are reliable. We hypothesized that the HCPI we previously developed to quantify owner-assessed chronic pain of dogs with osteoarthritis is valid and reliable. Our method to show validity and reliability was to test the following: construct validity, where component analysis suggests a stable component structure of the HCPI; criterion validity, by comparing a change in the HCPI with 2 other scales that are thought to change because of chronic pain; internal consistency, for which a Cronbach A value > 0.7 would indicate a correlation among items of the components; repeatability, for which correlations of r > 0.7 would indicate a good test-retest reliability; and responsiveness, for which a significant difference in the HCPI and its items as a result of medication administration (but not without medication administration) would indicate sensitivity of the HCPI to detect change.
Helsinki chronic pain index
Intention to treat
Principal component analysis
Quality of life
Visual analogue scale
Cartrophen vet. injectable 100mg, Biopharm Pty Ltd, Bondi Junction, NSW, Australia.
Rimadyl, 50-mg tablets, Pfizer, Espoo, Finland.
StatXact-8, Cytel Software Corp, Cambridge, Mass.
SPSS, version 12.0, SPSS Inc, Chicago, Ill.
Schulz KS, Cook JL, Kapatkin A, et al. Commentary. Evidence-based surgery: time for change. Vet Surg 2006;35:697–699.
Cook JL. Outcomes-based patient care in veterinary surgery: what is an outcome measure? Vet Surg 2007;36:187–188.
Kapatkin AS. Outcome-based medicine and its application in clinical surgical practice. Vet Surg 2007;36:515–518.
Hielm-Björkman AK, Kuusela E, Liman A, et al. Evaluation of methods for assessment of pain associated with chronic osteoarthritis in dogs. J Am Vet Med Assoc 2003;222:1552–1558.
Hudson JT, Slater MR, Taylor L, et al. Assessing repeatability and validity of a visual analogue scale questionnaire for use in assessing pain and lameness in dogs. Am J Vet Res 2004;65:1634–1643.
Wiseman-Orr ML, Scott EM, Reid J, et al. Validation of a structured questionnaire as an instrument to measure chronic pain in dogs on the basis of effects on health-related quality of life. Am J Vet Res 2006;67:1826–1836.
Brown DC, Boston RC, Coyne JC, et al. Development and psychometric testing of an instrument designed to measure chronic pain in dogs with osteoarthritis. Am J Vet Res 2007;68:631–637.
Streiner DL, Norman GR. Health measurement scales: a practical guide to their development and use. 2nd ed. New York: Oxford University Press, 1995;4–161.
Welsh EM, Gettinby G, Nolan AM. Comparison of a visual analogue scale and a numerical rating scale for assessment of lameness, using sheep as a model. Am J Vet Res 1993;54:976–983.
Quinn MM, Keuler NS, Lu Y, et al. Evaluation of agreement between numerical rating scales, visual analogue scoring scales, and force plate gait analysis in dogs. Vet Surg 2007;36:360–367.
Tabachnick BG, Fidell LS. Principal components and factor analysis. In: Using multivariate statistics. 5th ed. Boston: Allyn and Bacon, 2007;607–675.
Hielm-Björkman A, Tulamo R-M, Salonen H, et al. Evaluating a complementary therapy for moderate to severe canine osteoarthritis. Part I: green-lipped mussel (Perna canaliculus). Evid Based Complement Alternat Med 2007;doi: 10.1093/ecam/nem136.
Hielm-Björkman A, Tulamo R-M, Salonen H, et al. Evaluating a complementary therapy for moderate to severe canine osteoarthritis. Part II: a homeopathic combination preparation (Zeel). Evid Based Complement Alternat Med 2007;doi: 10.1093/ecam/nem143.
Begg C, Cho M, Eastwood S, et al. Improving the quality of reporting of randomized controlled trials. The CONSORT statement. JAMA 1996;276:637–639.
Jadad AR, Moore RA, Carroll D, et al. Assessing the quality of reports of randomized clinical trials: is blinding necessary? Control Clin Trials 1996;17:1–12.
Farrar JT, Portenoy RK, Berlin JA, et al. Defining the clinically important difference in pain outcome measures. Pain 2000;88:287–294.
Moher D, Schulz KF, Altman DG. The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. Lancet 2001;357:1191–1194.
MacPherson H, White A, Cummings M, et al. Standards for reporting interventions in controlled trials of acupuncture—the STRICTA recommendations. Acupunct Med 2002;20:22–25.
Holtsinger RH, Parker RB, Beale BS, et al. The therapeutic efficacy of carprofen (Rimadyl-V) in 209 clinical cases of canine degenerative joint disease. Vet Comp Orthop Traumatol 1992;5:140–144.
Brown DC. Sources and handling of losses to follow-up in parallel-group randomized clinical trials in dogs and cats: 63 trials (2000–2005). Am J Vet Res 2007;68:694–698.
Cattell RB. A guide to statistical techniques. In: The scientific use of factor analysis in behavioral and life sciences. New York: Plenum Press, 1978;17–32.
Waxman AS, Robinson DA, Evans RB, et al. Relationship between objective and subjective assessment of limb function in normal dogs with an experimentally induced lameness. Vet Surg 2008;37:241–246.
Vasseur PB, Johnson AL, Budsberg SC, et al. Randomized, controlled trial of the efficacy of carprofen, a nonsteroidal anti-inflammatory drug, in the treatment of osteoarthritis in dogs. J Am Vet Med Assoc 1995;206:807–811.