You are on page 1of 7

Research Repott

Training Users in the Gross Motor Function Measure: Methodological and Practical Issues

Background and Purpose. The Gross Motor Function Measure (GMFM) is a criterion-referenced obsemationul measure for assessing change in gross motor jknction for children with cerebral palsy (CP). The purposes of this report are to present data on the effects of training pediatric developmental therapists to administer and score the G F and to discuss some practical and methodological isMM sues associated with training. Suitjects and Metbods. A weighted kappa estimate pretraining and posttraining workshop was used to determine participants' agreement of scoring a videotaped G F assessment against experts' scoring of the MM same videotaped assessment. Several children with CP, representing a spectrum of qes, severities, and levek;of function, were shown on the videotape. Results. There wm a signiJicant improvement in agrementjkm a mean kappa of .58 to .82 (t=15.38,df= 75, P <.001)for thef i t group and from .81 to .92 (t=10.91, df =72, P <.001)for the second group following training. Conclusion and Discussion. Although there are a number of advantages to using videotapes to train test users and to mess scoring reliability, this method does not evaluate participants' ability to administer the measure. Further work is needed to determine whether reliability is maintained in a clinical situation in which it is necessay to both administer and score the G F [RussellD ,Rosenbaum PL, Lune M, MM J et al. Training users in the Gross Motor Function Measure: methodological and practical issues. Phys Ther 1994;74:630436]
Key Words: Cerebral palsy, Motorjknction, Reliabiliy, Training methods, Videorecordings.

Dlanne J Russell Peter L Rosenbaum Mary Lane Carolyn Gowland Charles H Goldsmith Wllllam F Boyce Nancy Plgws

DJ Russell, MSc, is Research Coordinator (NCRU), Department of Clinical Epidemiology and Biostatistics, Faculty of Health Sciences, McMaster University, Bldg 74, Chedoke Campus, Hamilton, Ontario. Canada L ~ N 325. P Rosenbaum, MD, FRCP(C), is Professor, Department of Pediatrics, Faculty of Health Sciences, L McMaster University, and Director of Pediatrics, Chedoke Child and Family Centre, ChedokeMcMaster Hospitals, Hamilton, Ontario, Canada L8N 325. Address all correspondence to Dr Rosenbaum. M Lane. Dip-P&OT, is Physiotherapist, Clinical Consultant to Halton Parent-Infant Program, Oakville, and Gross Motor Measures Group, Hamilton. Ontario, Canada L6J 6El. C Gowland, MHSc, PT,is Associate Professor, School of Occupational Therapy and Physiotherapy, Faculty of Health Sciences, McMaster University, and Research Manager, Physiotherapy Department, Chedoke-McMaster Hospitals. CH Goldsmith, PhD, is Professor, Department of Clinical Epidemiology and Biostatistics, Faculty of Health Sciences, McMaster University.
WE Boyce, MSc, PT,is Assistant Professor, School of Rehabilitation Therapy, Queen's University,

Kingston, Ontario, Canada K7L 5G2. N Plews, BHSc, PT,is Research Physiotherapist, Chedoke-McMaster Hospitals.

Development of a clinical test is a complex and time-consuming process. Researchers usually spend the majority Of their efon to establish that the test is reliable (consistent) and valid (measuring what it is supposed to measure). After the test has been published, there is often little time, money, o r energy left to evaluate how to teach others to use the test in an appropriate manner, let alone to assess the impact of the training on new users. Therapists should, however, know whether they are using tests in a manner that produces reliable measurements so that they can have confidence in their ability to attribute a change in score to changes in patient function rather than to measurement error. l

This article was submitted Februa ry 8, 1993, and was accepted January 24, 1994,

Physical Therapy/Volume 74, Number 7/July 1994

--

Table. Review of Manuals for Suggested Training Methods


Testb BSID BOTMP PDMS MAP PEDl GDS

Crlterla tor Suggested Tralnlng or Examiner Qualltlcatlonsn

. . . descriptions of the qualifications and competencies needed by test users. These descriptions should include statements regarding potential consequences of unqualified users administering the test. 1@5901

Experience with children Experience with standardized testing procedures Read and be familiar with manual and administration guidelines Practice Suggested testing for reliability Formal training offered by test developers or detailed explanation of how to train oneself Formal evaluation of training methods
- -

+
+ +

+ +
+
-

+
+

+
+

+ + + + +

Test manuals should also " . . . describe how potential test users can obtain the competencies necessary to administer the tests."l@598) Among the standards for clinicians using tests is the following:
Test users must be able to determine before they use a test whether they have the ability to administer that test . . . based on an understanding of the test user's own skills and knowledge (competency) as compared with the competencies described by the test pur~eyor.~(~~~*)

+
-

+ +
-

+ + +
-

- -

'Plus sign (+) indicates manual meets the criteria; minus sign (-) indicates manual does not meet criteria or information is missing. b ~ ~ l ~ = ~ a y l e of Infant Development, BOTMP=Bru~ninks-OseretskyTest of Motor ProfiScales y ciency, PDMS=Peabody Developmental Motor Scales, MAP=Miller Assessment for Preschoolers, PEDI=Pediatr~cEvaluation of Disability Inventory, GDS=Gesell Developmental Scales.

All measurements can be affected by several sources of variation, which can affect the reliability of the measure, including factors within the examinee (subject of the assessment), the examination (or test), the examiner (user), and the environment (c~ntext).~ Some important variables of the examinee are age, functional activity level, and degree of disability. The length of the assessment and the clarity of the administration guidelines are factors that may vary in the examination. Factors associated with the environment include the test setting (eg, room), temperature, and time of day. Other factors thought to be less controllable, but relevant, are patient compliance, age and background experience of the examiner, the examiner's familiarity with the examinee, and the method of assessment (direct contact or analysis of videotaped activities). One major controllable source of variation is the training of potential test users in the background, concepts, and application of the test. Rothsteins states that when evaluating tests for clinical use, it is important to
28 / 631

consider population-specific reliability for the particular group being measured and for the type of people administering the measures. It is useful to know the type of patients used in the reliability studies (and whether they are similar to the subjects who will be assessed) and the level of training of the examiners. It is important to know whether the examiners were a sample of typical therapists or whether they were part of the team developing the measure and therefore probably more expert in its administration and scoring. For reliability to be generalizable, reliability testing needs to be conducted with people thought to be typical of the users of the test. When considering whether to incorporate a new test for research or clinical practice, it is important to determine whether the test manual provides advice or preferably evidence about the best methods for training. The Standards for Tests and Measurements in Physical Therapy Practice require primary test purveyors or test developers to include in a test manual

Stengel4 reviews a number of tests for assessing motor development in nonnewborn children for the tests' reliability, validity, and usefulness to clinicians. He chose the tests because they are comprehensive, familiar to investigators studying the management of children with neurologic dysfunction, and readily available. These tests include the Bayley Scales of Infant Development,5 BruininksOseretsky Test of Motor Proficien~y,~ Peabody Developmental Motor Scales,' Miller Assessment for Preschoolers,Vediatric Evaluation of Disability Inventory,9 and Manual of Developmental Diagnosis.l0Stengel states that the tests are . . . fairly easy to learn to administer without the need for extensive special instruction,"4(~" but he presents no evidence to support this contention.
"

To determine how well test manuals addressed the issue of training, the manuals from the nonnewborn pediatric tests identified by Stengel4were reviewed by the primary author (DJR). The results of this review are presented in the Table. Most of the manuals recommend that therapists have experience with children, experience with standardized testing procedures, and practice as important factors in learning their measures, but few manuals actually explain how to obtain the necessary skills to ensure

Physical Therapy /Volume 74, Number 7,'July 1994

competency. The Pediatric Evaluation of Disability Inventory9 manual has the most detailed description of procedures for training. The authors of the manual advocate attending a training workshop; however, they also suggest methods of training with an experienced examiner. Overall, the authors recommend "high agreement" with an experienced examiner, but they do not specify a particular level of reliability. Case scenarios are also provided, which can be scored by test users to evaluate their reliability compared with the authors' scoring. Although formal training may be available for some of the measures reviewed, information on training is not presented in the Table if this information was not included in the test manual. We are not aware that any authors have evaluated the effects of training. The Gross Motor Function Measure (GMFM) was developed for use by pediatric physical therapists as an evaluative measure for assessing change over time in gross motor function of children with cerebral palsy. Tht: GMFM is an 88-item, criterion-based observational measure that assesses motor function in five "dimensions": (1) lying and rolling; (2) sitting; (3) crawling and kneeling; (4) standing; and (5) walking, running, and jumping. Each item is scored on a four-point scale (O=does not initiate activity, 1=initiates activity, 2 =partially completes activity, and 3 =completes activity). Specific descriptions for how to score each item are found in the administration and scoring guidelines contained within the test manual,ll which is available from the primary author (DJR). The results of the initial validation work on the GMFM have been published.12 In the original GMFM validation study, reliability of administering and scoring the GMFM over two occasions was assessed. A small number of developmental pediatric physical therapists familiar with the development of the GMFM and trained in the use of the measure completed interrater (n= 11) and intrarater (n= 10) reliability testing on a sample of children with

cerebral palsy. These children represented a spectrum of ages, diagnostic types, and severities. Using intraclass correlation coefficients (ICC [2,1]) ,I3 reliability estimates were calculated for each dimension as well as for total scores, and these values varied from .87 to .99. Following minor revisions to the items and guidelines, a second reliability study using a balanced incomplete block design was completed by 16 developmental pediatric therapists using the original and the revised guidelines." The therapists involved in this study were not regularly using the GMFM but had undergone some training. Although the initial reliability studies had all therapists administering and scoring the GMFM, this study required therapists to score from videotapes. The results of the study demonstrated ICCs of .75 to .97 between therapists scoring with the old and the new guidelines. Although the range of values for the reliability coefficients was greater in the study using videotapes, they were still high enough for us to conclude that trained therapists could score the modified GMFM reliably. Because all our reliability and validity data were collected using trained pediatric physical therapists who were involved in the development and validation of the measure, we needed to know whether training was generalizable to those therapists who would be typical users of the test (ie, clinicians working in children's treatment centers). Training would allow test users to determine their competency with scoring the GMFM and allow us to evaluate the value and impact of the training. The purposes of this report are (1) to present data on the effects of training developmental pediatric clinicians in the use of the GMFM using videotapes for training and testing, and (2) to discuss some practical and methodological issues that arose and may be generalizable to other measurement training situations.

Method
A 1-day GMFM training program was

developed. The workshops commenced with a description of the research background and psychometric properties of the GMFM, followed by a videotaped pretest (40 minutes including pauses between items). Four children with cerebral palsy (1 with athetosis, 2 with diplegia, and 1 with hemiplegia), varying in age from 2 to 14 years, were shown on the testing videotape. An overview of general concepts in administering and scoring the test was followed by group discussion on the scoring issues of each GMFM item using videotaped examples (4 hours). The teaching videotape showed 3 children (1 with quadriplegia, 1 with diplegia, and 1 with hemiplegia), who varied in age from 6 to 14 years. Later in the afternoon, 45 minutes was spent viewing a videotape and discussing how to calculate a total score and issues related to goal setting. The day ended with the readministration of the same videotape used at the pretest. The pretest and posttest scores were used to ascertain whether the training workshop had an impact on participants' ability to observe and score a videotaped GMFM assessment. The correct pretest and posttest scores were previously determined by three of the workshop trainers who viewed and independently scored the testing tapes using the GMFM administration and scoring guidelines. Disagreements were identified and discussed, and the videotaped activities were replayed until consensus on the correct or "criterion" score was achieved. Before commencing the pretest, participants were given a GMFM manual and instructed to use the administration and scoring guidelines when scoring the test videotape. Prior to being shown the GMFM item on videotape, the item number and the number of trials they would see the child attempt for that item were identified. The tape was stopped between items to allow participants time to score and prepare for the next item. No items were replayed. This protocol was repeated using the same

Physical Therapy /Volume 74, Number 7fluly 1994

videotape for the posttraining test. At the time of the pretest, participants completed a questionnaire about their previous clinical experience and their familiarity and experience with the GMFM. Prior to initiating training workshops, the plan was to develop three separate criterion videotapes, each approximately 20 minutes in duration. Each tape was to contain one third of the total number of items randomly selected from each of the five GMFM dimensions. These items were examined to ensure that a mixture of items was included (eg, a variety of starting positions, static and dynamic items). A sample of items from various levels of function (from children who were functioning primarily in the first GMFM dimension of lying and rolling activities to those capable of independent ambulation) was shown on each tape. Items were grouped by dimension and shown in numerical order. The first of the three videotapes was developed according to plan and was used in the first three workshops. Results using this videotape are summarized in the "Training Study A" section. Upon close examination of the individual item scores from workshop participants, it became apparent that one person could have fewer errors than another but have a lower overall estimated kappa. Because criterion videotape " A did not sample equally all possible GMFM scores (0, 1, 2, and 3), a bias was created. Participants had one chance on videotape " A to score 0 (indicating "does not initiate movement"), so that if they scored that item incorrectly, they were severely penalized (from a statistical point of view). When making the subsequent videotapes, this inequality was corrected by sampling more equally across GMFM scores. The results using the second videotape are summarized in the "Training Study B" section. The third videotape had not been used extensively at the time this article was written.

Data Analysis
A weighted estimated kappa statistic with a quadratic weight was used to analyze chance-corrected reliability between the rater's scoring and the criterion scoring.14To get a composite measure of agreement across all categories, a weighted mean of the individual item kappas was calculated as described by Fleiss.I4A kappa of 1.00 would indicate perfect agreement with the criterion scoring, and a kappa of 0.00 would be equal to chance agreement. A kappa statistic using a quadratic weight penalizes the rater more the further away the rater is from the correct score. When a weighted kappa is calculated using quadratic weights, it yields results similar to the ICC and has a similar interpretati0n.u A paired t test was used to examine the statistical significance of pretraining and posttraining estimated kappa scores, and an independent sample t test was used to compare the posttest estimated kappa scores with the criterion test scores. All t tests were twotailed. A Pearson product-moment correlation (r) was used to measure the relationship of criterion scores with previous clinical experience and previous experience with the GMFM using SPSSiPC+ version 4.0.16The .05 level was used to test for statistical significance.

A paired t test comparing the pretraining and posttraining scores to determine whether scores were significantly different for the total group (n=76) showed a statistically significant improvement in reliability, from a mean estimated kappa of .58 to .82 (t=15.38, df=75, P<.001). The posttest mean estimated kappa of .82 was also significantly higher than the criterion of .70 (t=10.2, df=75, P<.001). Eight percent of the workshop participants reached the criterion level of reliability on the pretest, and 84% reached the criterion on the posttest.

Tralnlng Study B
The data for training study B came from 73 participants who attended four workshops that followed those reported in training study A. Eightythree percent of the participants were physical therapists, 13% were occupational therapists, and 4% were early childhood educators. Workshop participants had a mean of 6.6 years of neurological pediatric experience, which varied from 0 to 20 years. Based on experience from training study A and guidelines in the literature regarding acceptable levels of reliability,ls the criterion kappa score for this second videotape was raised from .70, indicating good reliability, to 80, indicating excellent reliability. The results of the paired t test comparing the pretraining and posttraining scores of the total group (n=73) showed a statistically significant improvement in reliability, from a mean estimated kappa of .81 to .92 (t=10.91, df=72, P<.001). The posttest mean estimated kappa of .92 was significantly higher than the criterion of .80 (t=18.64, df=72, P<.001). Sixty-three percent of the workshop participants reached the criterion level of reliability on the pretest, and 100% reached the criterion level on the posttest. Of the 46 workshop participants who reached the criterion level on the pretest, there was still a significant improvement in criterion scores after training, from a mean estimated kappa of .86 to .93 (t=7.86, df=45, P<.OOl).

Results
Trainlng Study A
The data for training study A were derived from a total of 76 participants who attended the first three workshops. Eighty-six percent of the participants were physical therapists, 13% were occupational therapists, and 1% were kinesiologists. Workshop participants had a mean of 7.7 years of neurological pediatric experience, which varied from 0 to 25 years. The criterion kappa score for this videotape was set at .70, based on experience from training with the Gross Motor Performance Measure.'7

Physical Therapy /Volume 74, Number 7July 1994

The number of years of pediatric neurological experience was correlated with the estimated kappa values for the entire sample (n=149) to determine whether experienced clinicians were more reliable than less experienced clinicians. This was not the case at the pretest, with years of pediatric neurological experience correlating at r = -.04 (t=0.54, df=l47, P>.05). At the posttest, there was a small negative correlation of r=-.16 (t=1.92, df=147, P<.05), which was statistically significant but accounts for less than 3% of the variance. There was no significant correlation (r= -.06, t= -0.84, df= 147, P > .05) between years of pediatric neurological experience and improvement in estimated kappa scores.

videotape of the same assessment 6 weeks later. Boyce et all7 report ICCs (2,l) varying from .90 to .97 on individual attribute scores and .93 overall. There was a marked difference in the pretest reliability scores and in the number of people reaching the criterion level of reliability, depending on which testing videotape was used to assess scoring. Higher pretest scores in training study B could have been due to the finding that more than twice as many participants in training study B had reported reading the administration guidelines prior to the workshop (23% as compared with 10% in training study A). We have not yet evaluated separately this particular source of variation in trainee skill (ie, how much familiarity with the material prior to the training workshop influences success in reaching a criterion level of reliability). Further work is needed to assess whether a certain amount of practice with the measure prior to testing would be sufficient to reach a criterion level of reliability, without the need of a formal workshop. It is important to note that although 63% of therapists reached the criterion score on the pretest in training study B, their scores still improved significantly following the workshop. Interestingly, our results showed that years of pediatric experience had little effect on the participants' ability to learn to administer the GMFM, and we believe that from what we currently know, years of experience should not preclude people from undergoing the training process. The videotape used in training study B appeared to be much easier to score than the videotape used in training study A. Things we learned after preparing the first videotape were used to improve the second videotape. These improvements included a longer "lead-in" time prior to the desired movement and a more equal sampling of scores across the total number of response options. By chance, we may also have sampled some items with less contentious scoring issues on the second videotape.

Discussion
The results of these studies demonstrate that clinicians who attend a 1-day GMFM training workshop improve their scoring reliability significantly when tested using videotaped assessments. Note, however, that the methods used in this study relate to evaluating the reliability of scoring a videotape and do not take into account other sources of variability that are present when clinicians are assessing children in the clinic (eg, variability due to different testers, children, and environments). Whether reliability values obtained using videotapes are higher or lower than those obtained using real-life assessments was not addressed as part of this study. There are good reasons to believe either that real-life assessments might be easier to perform and hence more reliable (eg, because more information is available to the examiner than can be provided on a videotape) or that these assessments are more difficult to perform and hence less reliable (eg, because the assessor must simultaneously test and score), so that empirical studies of these issues are needed to address these questions appropriately. Results from reliability work with the Gross Motor Performance Measure show high levels of intrarater reliability when therapists administer and score an assessment and then rescore the Physical Therapy /Volume 74, Number

An important consideration in all reliability studies is the need to sample the range of performance across the range of items. For example, if therapists were determining their agreement of scoring the GMFM with a child who was an independent ambulator, it is likely that the child would score "3" (completes independently) on most items in the first three dimensions (lying and rolling, sitting, crawling and kneeling). Therapists would have a high level of agreement strictly because there was little room for disagreement. A more credible estimate of whether therapists agree would be determined by sampling more items for which the child is likely to have a mixture of item scores (Os, Is, 2s, and 3s), as would be the case (in this example) in the higher two GMFM dimensions (standing; walking, running, and jumping). By including samples of GMFM items from children performing across the spectrum of function, a more realistic estimate of agreement is obtained.

Primary purveyors of measures usually spend a great deal of time developing and validating a new instrument, and collecting normative data. Generally, a much smaller amount of effort is directed toward issues of training. Although clinicians have a responsibility to acquire the necessary training before using a new measure, it is often not clear what the necessary training is, o r how to acquire it in a systematic and effective manner. The time and cost associated with setting up a training package have likely been deterrents to its development.

Several methodological issues were considered in planning this evaluation of impact of the GMFM training program. Precautions were taken to minimize a learning effect as a result of doing the pretest. Workshop participants were not given any feedback on their performance on the test either following the pretest or during the workshop. A separate videotape with different children was used during the training. The pretest videotape was used again at the posttest. Had we used different testing videotapes, this

would have added another source o f variation, and any differences in pretest-posttest scores might have been due to variability in the videotapes. Although written feedback from participants indicated the training workshops were beneficial for them, with each new testing videotape (encompassing different items and therefore different issues), the workshop trainers learned more about problematic wording and scoring issues with the GMFM. This has allowed for revisions to the test manual's (available from the primary author) to highlight difficult training issues currently being dealt with in the workshop. We do not yet know whether the second edition of the manual will provide untrained users of the GMFM with a clearer set of directions for selflearning of the measure. To make training more accessible to therapists who are unable to attend a workshop, the Gross Motor Measures Group has developed a videodisc training package that contains videotape examples o children similar to those used for a f workshop, along with a written commentary. This method will need to be evaluated to determine whether individuals learning by the videodisc can reach similar levels of scoring reliability as workshop participants. There are a number of disadvantages and advantages to the use of videotapes as a medium for training and evaluating new users of a test such as the GMFM. One of the main disadvantages of using criterion videotapes to assess reliability is that this method is only testing the participant's ability to score the videotaped test reliably and is not providing an indication of the assessor's ability to administer and score the test in a clinical situation. For example, can the examiner elicit appropriate responses from the examinee as well as score them reliably? This is particularly important for a test that involves direct observation of performance rather than being scored from videotaped assessments. This aspect of learning and performing the GMFM needs to be studied further by examining the reliability of workshop

participants in a clinical situation and comparing the reliability with that achieved in the workshop. Another problem with using videotapes is the quality of videotaping, in particular, the ability to capture on videotape, from the best possible camera angle, the movement the therapist is trying to test. Experience has shown it may be more difficult to judge whether a child is "initiating" a movement from videotape o r from real life. We relied on the use o f expert audiovisual personnel to develop our training materials in an effort to address and overcome these problems. There are, however, a number of advantages to using videotapes as a method of assessing reliability. First, it is possible to evaluate the effects of an intervention (such as a training workshop) in a standardized manner. Second, the use of videotapes allows an efficient means of assessing several patients of varying diagnostic and functional levels while eliminating the issue of patient compliance. This advantage is particularly appealing when dealing with children. Videotapes can be edited to ensure they are capturing different training issues and covering an appropriate spectrum of function. Third, by having a criterion testing videotape with the "correct" score, as determined by experts, the therapist can ensure that responses are not only reliable but valid. For example, if therapists in the clinical setting learn an assessment together, they may make an administration or scoring decision that is different from the intent of the test developer. When the therapists then assess interrater reliability, it may be high because everyone agrees on how to score, but the score is not the correct (valid) one. Finally, another use for criterion testing videotapes is to have an easy f method o assessing ongoing levels of competency. Tests can be completed at regular intervals to ensure that high levels of reliability are maintained over time. Grossz0 and Gross and Conradz1offer further discussion of the advantages and disadvantages of

using videotape to capture observational data. The issue of how reliable is reliable enough is an interesting one. Although a number of guidelines are suggested in the l i t e r a t ~ r e , l this ~ ~ ~ ~ ~~ is still an arbitrary decision. Streiner and Norman23suggest that an acceptable level of reliability is dependent on the size of the sample, and they point out that clinical assessments used to make decisions on individuals need to be more reliable than those using grouped data. This is because data that are grouped (as in research studies) and used as the mean of several individuals have smaller measurement error. A reliability coefficient itself does not let the therapist know how many errors were made, which is why we provided participants who did not reach our preset criterion level with feedback on individual item problems. A test does not have a single level of reliability; therefore, identifying sources of variability is useful because this information can be used to try and reduce large sources of error variance23 As clinicians and researchers, we want to be as reliable as possible to make valid decisions about the management of children. In our work, we chose to increase the criterion level of reliability required with the second videotape based on our experience with the first videotape and to ensure a more rigorous level of reliability.
As we have tried to illustrate in this communication, there are methodologic and design features that can be used to address these issues. It is clear that as much care is needed in the preparation and testing of training in the use o a test as in its creation f and validation. We believe that primary purveyors have a responsibility to their clinical colleagues and that they can learn useful and important lessons about their measure while providing training in its use.

Summary

We have shown that training improves workshop participants' agreement of scoring a videotaped GMFM assess-

Physical Therapy /Volume 74, Number 7/July 1994

ment. Although there are a number of advantages to using videotapes to train test users and assess scoring reliability, this method does not evaluate participants' ability to administer the measure. Therefore, further work is needed to determine whether reliability is maintained in a clinical situation in which it is necessary to both administer and score the GMFM.
Acknowledgments

We gratefully acknowledge the contribution of data and thoughtful comments and questions from participants of the training workshops. We also thank Jim Chen for his computer assistance and Marilyn Marshall and Gerry Karlovic for the preparation of this manuscript.
References
1 Task Force on Standards for Measurement in Physical Therapy. Standards for tests and measurements in physical therapy practice. Phys Ther. 1991;71:589-622. 2 Sackett DL, Haynes RB, Guyatt GH, Tugwell P. Clinical Epidemiology: A Basic Science for Clinical Medicine. 2nd ed. Boston, Mass: Little, Brown and Co Inc; 1991.

3 Rothstein JM. Measurement and clinical practice: theory and application. In: Rothstein JM, ed. Measurement in Physical Theram. New York, NY:Churchill Livingstone Inc; 1985146. 4 Stengel TJ. Assessing motor development in children. In: Campbell SK, ed. Pediatric Neurologic Physical Therapy. New York, NY: Churchill Livingstone Inc; 1991:33-65. 5 Bayley N. Bayley Scales of Infant Deveiopment. New York, NY: The Psychological Corporation; 1969:5. 6 Bruininks RH. Bruininks-Oseretsky Test of Motor Proficiency. Circle Pines, Minn: American Guidance Service; 1978:42. 7 Folio MR, Fewell RR. Peabody Developmental Motor Scales and Activiry Cards. Allen, Tex: DLM-Teaching Resources; 1983:13. 8 Miller LJ. Miller Arcement for Preschoolers. Littleton, Colo: The Foundation for Knowledge in Development; 1982:3. 9 Haley SM, Costner WJ, Ludlow LH,et al. Pediatric Evaluation of Disabiliry Inventorj (PEDI): Development, Standardization and Administration Manual (Version I). Boston, Mass: New England Medical Center Hospital; 199280-88. 10 Knobloch H, Stevens S, Malone AF. Manual of Developmental Diagnosis. Rev ed. New York, NY: Harper & Row; 1980. 11 Russell DJ, Rosenbaum PL, Gowland C, et al. Manual for the Gross Motor Function Measure: A Measure of Gross Motor Function in Cerebral Palsy. Hamilton, Ontario, Canada: McMaster University; 1990. 12 Russell DJ, Rosenbaum PL, Cadman DT, et al. The gross motor function measure: a means to evaluate the effects of physical therapy. Dev Med Child Neurol. 1989;31:341-352.

13 Shrout PE, Fleiss JL, lntraclass correlations: uses in assessing rater reliability. Psycho1 Bull. 1979;86:420-428, 14 Fleiss JL. Statistical Methods for Rates and Proportions. 2nd ed. New York, NY: John Wiley & Sons Inc; 1981:219. 15 Kramer MS, Feinstein AR. Clinical Biostatistics LIV: the biostatistics of concordance. Clin Phannacoi Ther. 1981;29: 1-123. 11 16 Norusis MJ. SPSS/PC+ Statistics d o f o r the IBM PC/AT/AT and P.Q. Chicago, Ill: SPSS Inc; 1990. 17 Boyce WF, Goasland C, Rosenbaum PL, et al. Gross Motor Performance Measure for children with cerebral palsy: study design and preliminary findings. Can J Public Health. 1992;83(suppl):S34-S40. 18 Landis JR, Koch GG. The measurement of observer agreement for categorial data. Biometrics. 1977;33: 159-174. 19 Russell D, Rosenbaum P, Gowland C, eta]. Gross Motor Function Measure Manual. 2nd ed. Hamilton, Ontario, Canada: McMaster University; 1993. 20 Gross D. Issues related to validity of videotaped observational data. West J Nurs Res. 1991;13:658663. 21 Gross D, Conrad B. Issues related to reliability of videotaped observational data. WestJ Nurs Res. 1991;13:798-803. 22 Law M. Measurement in occupational therapy: scientific criteria for evaluation. Canadian Journal of Occupational Therapy. 1987;54: 133-138. 23 Streiner DL, Norman GR. Health Measurement Scales: A Practical Guide to Their Development and Use New York. NY: Oxford University Press Inc; 1989.

i
I I I I I I I I
1

AN APTA MALPRACTICE RESOURCE GUIDE


L ,
L ,

RISK MANAGEMENT:
(217 pages, tabbed 3-ring binder, 1990) Order No. P-95
Name Address CiryISrardZip Daytime Telephone

Ae;aikzb/eatht! Previously offered only through APT'S risk management worksho~.this heftv euide to risk management issues shares information , every PT professional needs to know. ..on preventive recording... malpractice and other bases of potential liability..the judicial process when dealing with malpractice ...informed consent ... more! In a handy, and usable format. Numerous expert contributors.

(7Check enclosed pa)able to APTA

(7Mastercard

(7VISA

Physical Therapy /Volume 74, Number 7/July 1994

636 / 33

You might also like