|Home | About | Journals | Submit | Contact Us | Français|
The purpose of the Research Diagnostic Criteria for Temporomandibular Disorders (RDC/TMD) Validation Project was to assess the diagnostic validity of this examination protocol. An overview is presented, including Axis I and II methodology and descriptive statistics for the study participant sample. This paper details the development of reliable methods to establish the reference standards for assessing criterion validity of the Axis I RDC/TMD diagnoses. Validity testing for the Axis II biobehavioral instruments was based on previously validated reference standards.
The Axis I reference standards were based on the consensus of 2 criterion examiners independently performing a comprehensive history, clinical examination, and evaluation of imaging. Intersite reliability was assessed annually for criterion examiners and radiologists. Criterion exam reliability was also assessed within study sites.
Study participant demographics were comparable to those of participants in previous studies using the RDC/TMD. Diagnostic agreement of the criterion examiners with each other and with the consensus-based reference standards was excellent with all kappas ≥ 0.81, except for osteoarthrosis (moderate agreement, k = 0.53). Intrasite criterion exam agreement with reference standards was excellent (k ≥ 0.95). Intersite reliability of the radiologists for detecting computed tomography-disclosed osteoarthrosis and magnetic resonance imaging-disclosed disc displacement was good to excellent (k = 0.71 and 0.84, respectively).
The Validation Project study population was appropriate for assessing the reliability and validity of the RDC/TMD Axis I and II. The reference standards used to assess the validity of Axis I TMD were based on reliable and clinically credible methods.
The Research Diagnostic Criteria for Temporomandibular Disorders (RDC/TMD) (1) specifies a dual-axis diagnostic system for temporomandibular disorders (TMD) supported by a well-operationalized history and examination protocol. The Axis I clinical assessment protocol is designed to render TMD diagnoses, and the Axis II screening instruments assess psychological status and pain-related disability. Together, Axis I and Axis II assessments constitute a comprehensive evaluation consistent with the biopsychosocial health model.(2)
Advancement in our understanding of the prevalence, etiologies, natural progression, and treatment of TMD is dependent on having reliable and valid diagnostic criteria for these disorders. The 1996 NIH Technology Assessment Conference Statement on the Diagnosis and Management of Temporomandibular Disorders noted that an ideal diagnostic classification system should be based on etiology.(3) However, the RDC/TMD Axis I diagnostic protocol, which is based on measurement of signs and symptoms, is the best and most used classification system to date for the epidemiological studies that are needed to understand TMD etiology and mechanisms. (4)
The current RDC/TMD taxonomic system was not intended to be an end product. Ongoing efforts to investigate its validity were anticipated and encouraged when the RDC/TMD was first established in 1992.(1) To date, no comprehensive investigation of the Axis I diagnostic reliability and validity has been reported. Axis II instrument reliability has been demonstrated,(5) but the validity of the Axis II screening instruments for assessing psychological status and pain-related disability in TMD cases has not been adequately demonstrated. Thus, a comprehensive evaluation of RDC/TMD reliability and validity were needed.
Numerous publications have suggested aspects of the RDC/TMD that could be improved to more effectively distinguish TMD cases from controls and differentiate diagnostic subgroups (6–20). The first aim of the current RDC/TMD Validation Project was to rigorously establish the reliability and validity of the RDC/TMD diagnostic protocol in its published form. The second aim was to propose modifications for the protocol that would improve its reliability and validity as a taxonomic system.
We present 6 papers describing the RDC/TMD Validation Project.
The purpose of the current paper is to present (1) an overview of the study methods for assessment of reliability and validity of the RDC/TMD Axis I and II as a taxonomic system used to distinguish TMD subtypes from each other and from normal controls; (2) descriptive data for the RDC/TMD Validation Project study sample; (3) the procedures used for establishing credible Axis I reference standard diagnoses; and (4) clinical examiner and radiologist reliability data supporting the Axis I reference standards.
We used the Standards for Reporting of Diagnostic Accuracy (STARD) nomenclature (26) to allow for clarity in our reporting. The terms and their definitions as they pertain to this study are:
The RDC/TMD Validation Project assessed the criterion validity of the index tests: the original RDC/TMD Axis I diagnostic algorithms and the Axis II psychological status and pain-related disability instruments. Criterion validity is the measure of the validity of an index test when assessed against a credible reference standard. For assessment of Axis I criterion validity, the reference standard diagnoses were based on the consensus of two criterion examiners at each site. The criterion examiners were TMD experts who independently rendered their TMD diagnoses using the criterion examination protocol that was considerably more comprehensive than that specified by the original RDC/TMD protocol. The elements of this comprehensive examination are discussed below.
The evaluation of the RDC/TMD protocol for Axis I was a multi-site collaboration among researchers at the University of Minnesota (UM), the University of Washington (UW), and the University at Buffalo (UB). The study took place at research centers at each of these institutions.
Beginning in August 2003, study participants were consecutively recruited until three-fourths (approximately 550) of the study sample had been enrolled. At this point, it was necessary to institute selective recruitment in order to fill the recruitment goals for the less common TMD diagnoses, including certain of the disc displacement and arthrosis cases. Other subgroups of participants requiring selective recruitment were older age categories for normal participants and TMD pain cases needed for completing Axis II studies. Selective recruitment was continued until study closure in September 2006. Participants were drawn from 2 sources: direct referrals from local health care providers to the respective university-based TMD centers (i.e., clinic cases) and responses to community advertisements (i.e., community controls and cases). Thus, the study sample was a convenience sample that was recruited from both clinic and community sources.
Recruitment was designed to include cases with a full spectrum of TMD signs and symptoms. Participants, ages 18 to 70 years old, entered the study as putative TMD cases or controls based on the inclusion and exclusion criteria listed in Table 1. The inclusion criteria for study eligibility differed from the published RDC/TMD diagnostic criteria by assigning putative case status to individuals who reported a minimum of 1 of the 3 cardinal symptoms of TMD: jaw pain, limited mouth opening, or temporomandibular joint (TMJ) noise. Participants who denied currently having any of these symptoms were enrolled as controls.
IRB approval was obtained at each of the 3 study sites prior to initiating this project. Participants were compensated $200 for their participation in the Axis I and II clinical assessment, $25 for participation in Axis I and Axis II questionnaire test-retest reliability substudies, $75 for participation in the Axis II criterion substudy, and $50 to $200 for participation in examiner reliability substudies.
Methods for the Axis I and an overview of the Axis II procedures are described separately below
Sample size requirements stipulated a priori for the sensitivity and specificity estimates in this project specified that neither the upper or lower confidence bound should differ from the point estimate by more than 0.10. Assuming symmetrical confidence bounds, the half-width for each confidence interval is expressed as , where p is the estimated sensitivity or specificity, and N is the number of participants truly positive for a diagnosis as determined by the reference-standard diagnosis. Based on an observed sensitivity or specificity of 0.5 (when the binomial variance is the largest), and with the desired precision defined by upper and lower confidence bounds no greater than 0.10 for all sensitivity and specificity point estimates, 100 cases were required for each diagnosis. Each TMD case could potentially present with up to 5 TMD diagnoses: a Group I muscle diagnosis, a Group II disc displacement diagnosis for each of 2 joints, and a Group III diagnosis of arthralgia, arthritis, or arthrosis for each of 2 joints. Recruitment of 600 cases was expected to provide a minimum of 100 TMD diagnoses representing each of the 8 original RDC/TMD subdiagnoses. In addition, we planned for an additional 100 participants with minimal symptoms who would be subclinical with respect to the RDC/TMD diagnostic protocol, but who could qualify as TMD cases based on the consensus of the criterion examiners. Finally, we planned to recruit 100 controls, that is, participants with no current signs or symptoms of TMD who represented 4 age strata: 18 to 30, 31 to 40, 41 to 50, and 51 to 70 years of age. This stratification allowed for selection of a “pool” of controls that, at the time of analysis, could match the age distribution of participants in each of the 8 TMD subgroups. Given the study sample design above, a total of 800 participants was the initial estimated requirement for the study. Further description of the recruitment objectives resulting from this design is described in the third paper in this series. (22)
Demographic measures of the study population included gender, age, race, education level, and income.(1) Baseline measures to describe the clinical characteristics of the study participants included characteristic pain intensity, (1,27) duration of pain, (1) depression, (1,28,29) nonspecific physical symptoms, (1,28,29) Graded Chronic Pain Scale scores, (1,27) and the number of RDC/TMD Axis I diagnoses present for each case.
One of the index tests to be validated by this project was the published RDC/TMD Axis I diagnostic examination procedure that employs a set of standardized clinical and questionnaire items. Each of the clinical measurements has been well defined with operational criteria (1) and allows for assignment of TMD participants to any of 3 diagnostic groups that include 8 subdiagnoses:
It was required that tests included as part of the reference standards derived from the criterion examination protocol would be simple, reliable, easy to perform, and appropriate for the research setting. Potential Axis I diagnostic tests were drawn from (1) recommendations in the 1992 RDC/TMD monograph; (1) (2) conclusions from other research published since 1992; (3) tests recommended by the study’s External Advisory Panel (AP) composed of clinical and research specialists appointed by the NIDCR; and (4) suggestions solicited from members of TMD organizations, including the American Academy of Orofacial Pain. From these recommendations, we developed a list of candidate history questions and examination tests to be considered by the AP. Some proposed tests were ruled out by the AP as being beyond the scope of this study. Such tests included electronic diagnostic systems for assessing joint vibration to potentially detect disc displacements and osteoarthrosis. The AP-vetted diagnostic tests were then operationalized and tested for reliability. The final list of procedures constituting the criterion exam that was performed by the criterion examiners (CEs) is shown in Table 2.
The criterion history data collection included the published RDC/TMD History Questionnaire (1) along with the Supplemental History Questionnaire that was developed and used by the CEs as part of their semistructured participant interview. This supplemental history consisted of 61 questions assessing multiple dimensions of pain in the jaw muscles, TMJ, ear, and temple including whether the pain was changed with jaw movement, function, parafunction and/or rest. It also assessed tension-type headache using operationalized International Headache Society criteria,(30) and history of joint noise, jaw locking, and perceived occlusal change. To measure changes in these variables occurring between study visits, a Supplemental History Follow-up Questionnaire was also developed for use at the second CE visit. These supplemental questionnaires will be described and evaluated in a future publication that will include estimates of their test-retest reliability and their capacity to predict the reference standard diagnoses.
The criterion examination protocol included all the measures as operationalized in the RDC/TMD. These measures were performed according to the published RDC/TMD specifications. (1) In addition, the criterion examination was composed of several previously described examination procedures, including joint-play tests (i.e., traction, translation, and compression),(31–33) static and dynamic tests, (31,34) soft and hard end-feel,(35) algometry, (36) bite test with unilateral and bilateral placement of cotton rolls, (35,37) and a 1-minute clench. (38) New tests for the criterion protocol were the myofascial palpation test and the modified joint palpation test. The myofascial palpation test performed at the RDC/TMD-specified muscle sites in the masseter and temporalis used a range of 2 to 4 pounds of pressure rather than the 2 pounds specified by the RDC/TMD examination protocol for muscle palpation. The examiner used the spade-like pad of one finger to apply this pressure to the surface of the muscle while moving the finger back and forth across the long axis of the muscle fibers. This palpation technique was maintained for no more than 5 seconds. To locate areas associated with potential pain referral, the examiner: (1) placed the muscle on a slight stretch; (2) located so-called “taut bands” in the temporalis and masseter muscles by palpating across or along the long axis of the muscle fiber; (3) slid the finger across the muscle fibers or along the muscle fibers (with muscle slightly stretched); and/or (4) asked the subject to clench his/her back teeth together while the area of greatest muscle bulk during the contraction was examined. The modified joint palpation test for evaluating joint pain was as follows: the examiner requested the participant to “Open slightly so your teeth are not touching.” The examiner then located the lateral pole of the TMJ and, keeping an edge of the palpating finger on the lateral pole of the participant’s TMJ, the examiner orbited his/her finger around the lateral pole using a range of 2 to 3 pounds of pressure with a target of at least 2 pounds. A range of palpation pressure was used for this latter test because, like the myofascial technique, it required motion while applying the pressure and our collective experience was that it is not always possible to apply an exact pressure. Joint loading with opening (31) and the use of a stethoscope were additional methods for assessing joint noise that were used to supplement the published RDC/TMD auscultation method. The participants’ report of exam-induced joint noise was also recorded. If the participant reported distinct sounds such as clicking, popping or snapping sounds, these were recorded as a “click” and longer duration sounds including crunching, grinding or gratings sounds were recorded as “crepitus.” If any exam test elicited a report of pain, or if pain occurred with clicking noises, then the participant was asked if this pain was a “familiar pain,” that is, pain similar to or like what he/she had been experiencing from the target condition outside the examination setting. Participants with a report of pain were also asked to indicate if the pain was referred and, if so, at what other site it was felt. The occlusal assessment included recording the number of teeth, overbite, crossbite, and midline discrepancy,(39,40) occlusal intercuspal contacts were assessed using Shim stock® (Almore International Inc. Portland, Oregon) in maximum intercuspal position (MIP). (41) Centric relation position (CR), and CR to MIP slides were assessed. (42)
Imaging of participants included a panoramic radiograph, bilateral TMJ magnetic resonance imaging (MRI) and computed tomography (CT) scans. Details of the image analysis criteria used by the radiologists to identify MRI-disclosed disc displacements and CT-disclosed osteoarthrosis are described in detail elsewhere.(43) Briefly, the criteria for osteoarthritis/osteoarthrosis was the presence of deformation due to subcortical cyst, surface erosion, osteophyte, or generalized sclerosis. Osseous flattening and/or subcortical sclerosis were considered indeterminate for these diagnoses. The criteria in the sagittal plane for a normal disc position in the closed mouth position was that the border between the low signal of the disc and the high signal of the retrodiscal tissue was located between the 11:30 and 12:30 clock positions and the intermediate zone was located between the condyle and the articular eminence. For the closed mouth position, a diagnosis of disc displacement was rendered when these two criteria were not met. In the open mouth position, to be normal, the intermediate zone was located between the condyle and eminence, and with persistent disc displacement, the intermediate zone was anterior to the superior aspect of the condyle.
The criterion examiners, using questionnaires and a semi-structured interview, reviewed the medical history and pain characteristics in order to rule out possible non-TMD pain conditions and to exclude individuals with co-morbid conditions (see exclusion criterion in Table 1). Participants reporting a history consistent with migraine were not excluded. However, if a participant presented for evaluation while having an active migraine headache, the subject was rescheduled at a later date for the clinical examination. In addition, panoramic radiography and a clinical exam, including assessment for warmth, swelling and redness of the tissue, were used to rule out odontogenic, soft tissue, and hard tissue pathology. Other pathology not targeted for inclusion in the project was ruled out with TMJ MRI and CT. In establishing the reference standard diagnoses, the criterion examiners considered self-report of pain in the last month; effect of jaw function, movement, parafunction and rest on the reported pain over the past month; replication of the reported pain on provocation using clinical tests (see Table 2); and the TMJ CT and MRI studies. The criterion examiners also considered both common and uncommon TMD conditions that were operationalized by the consensus of the criterion examiners (see Table 3).
The criterion examiners performed their evaluations within the following procedural framework. Each of two CEs interviewed and examined each participant blinded to each other’s findings. Using all available clinical information including the imaging studies with the radiologist’s interpretations, they independently rendered their criterion diagnoses. They then compared their findings and, if either CE differed with the other’s findings or diagnoses, the participant was reexamined by both of them to resolve the area of disagreement. If either CE disagreed with the radiologist’s interpretation, the radiologist was consulted for further review of the images with the CEs. The reference standard diagnoses were then established by consensus between the CEs. The study’s requirement of a consensus between 2 independent examiners was designed to reduce the likelihood of diagnostic error. The estimated absolute error associated with a single exam is reported in the Results section.
A total of 9 clinicians served as the examiners for the Axis I validation study, including 2 CEs and 1 dental hygienist (test examiner; TE) at each study site. All 6 of the CEs were specialists in TMD and orofacial pain dentistry; CEs had between 12 and 38 years of experience in research and clinical management of TMDs. The 3 dental hygienists who served as the TEs were trained and calibrated to perform the RDC/TMD examination protocol. The radiologists at the UM and UW were diplomates of the American Board of Oral and Maxillofacial Radiology and the radiologist at UB was a diplomate of the American Board of Radiology and Neuroradiology; radiologists had between 12 and 23 years of experience interpreting TMJ images.
Based on STARD terminology, (26) the data collection for this project was prospective in that all history, exam, and imaging data collections were planned before the index test (RDC/TMD procedures) and the criterion examination procedures for the reference standard were performed.
Identical data collection protocols were performed at each study site (Figure 1). Participants who met initial screening criteria, as assessed by the study coordinator using a structured interview, were scheduled for Visit 1. They were asked to complete the baseline self-report instruments 1 day prior to their first appointment. The baseline data collection instruments included the RDC/TMD History Questionnaire, (1) Medical History Inventory, and Supplemental History Questionnaire (Table 2).
The index test, i.e., the algorithmically derived RDC/TMD diagnoses based on the TE examination findings, and the reference standard, i.e., the consensus diagnoses rendered by the 2 CEs, were both performed on the same day. The index test exam was always completed before the reference standard diagnosis was established.
Criterion examiner reliability: Beginning at baseline and over the course of the project, 3 sessions were planned for which a single CE from each study site came to the University of Minnesota for assessment of criterion examination diagnostic reliability. Each examiner performed the same criterion protocol on each study participant prior to all 3 examiners coming together to render a consensus diagnosis. This study design allowed for an overall estimate of diagnostic agreement between the individual criterion exam diagnoses and the consensus-based reference standard. It also provided an estimate of interexaminer reliability by comparing the individual criterion exam findings across the 3 examiners. Twenty-six participants were assessed over these 3 sessions that were programmed to occur after one of the annual calibration exercises, as described in the second paper in this series. (21)
In addition, within each study site, assessment of diagnostic agreement between the criterion exam and the reference standard was made possible because, for all study participants, the CE-2 criterion exam and the reference standard consensus were performed the same day.
At baseline and on a yearly basis over the course of the study, 4 exercises were planned for the assessment of the reliability of the study radiologists. (43) Calibration of the radiologists from the three sites began with their review of and discussion regarding a representative sample of panoramic radiographs, CT and MRI showing all osseous characteristics from normal to frank OA. In addition, MRI was used for demonstrating normal disc position, disc displacement with reduction, and disc displacement without reduction as well as effusions. For reliability assessment, each radiologist viewed panoramic radiographs; representative axially corrected coronal and sagittal slices from CT; and open- and closed-mouth sagittal views of PD-MRI and T2-MRI. For the initial reliability study, the images were collected from prior studies or teaching files from the three research locations. For the three subsequent annual reliability studies, the images used were from the participants in the current project that were selected by one of the University of Minnesota radiologists to represent all the intra-articular disorders. The selected images represented the full scope of possible diagnoses presented in random order. Each of the radiologists interpreted panoramic radiographs, CT and MRI blinded to each other’s findings and the clinical data. The images were scored according to the criteria developed for RDC/TMD Validation Project. For the initial reliability assessment, 59 joints seen on panoramic radiographs, 70 CT and 70 MRI were used to assess for osteoarthritis, and 68 MRI for disk position. For the subsequent reliability studies, 20 panoramic radiographs, 25 CT in closed mouth, and 25 MRI sets in closed and open mouth were selected to represent all the intra-articular disorders. These CT, MRI, and panoramic radiographs were grouped as sets, but a given set did not represent the same participant. All responses on the data collection forms were categorical.
Among all the questionnaires employed in this project, only 3 questions were used as required determinants for Axis I diagnoses. All three were part of the published RDC/TMD History Questionnaire. (1) These were: Question #3, “Have you had pain in the face, jaw, temple, in front of the ear, or in the ear in the past month?”; Question #14a, “Have you ever had your jaw lock or catch so that it would not open all the way?”; and Question 14b, “Was this limitation in jaw opening severe enough to interfere with your ability to eat?” Test-retest reliability assessment of the RDC/TMD History Questionnaire and the Supplemental History Questionnaire was performed on a subset of participants who participated in Axis I assessment at UB and UW. Reliability results for only Questions 3, 14a, and 14b are reported in this paper.
Proc Freq (SAS Institute) was employed to compute percent agreement between examiners. Kappa (k) was specified as the primary measure of reliability of diagnostic renderings. Kappa was also the primary measure for estimating diagnostic agreement between the criterion exam protocol and the reference standards. These estimates were computed using generalized estimate equations (GEE) techniques based on a procedure described by Williamson et al. (44) These GEE procedures provided adjustment for side-to-side correlation within participants for diagnostic renderings.
Reliability for the radiograph interpretations was computed using simple kappa, because there was no issue of correlated data in these data sets. The films employed for all radiology calibration exercises were either right or left side films for any given participant, but not both sides. Stata statistical software was employed to obtain these estimates across the 3 examiners. (45)
Three separate studies were performed for assessing Axis II of the RDC/TMD. Briefly, these studies addressed the following:
For the entire evaluation of the RDC/TMD Axis II instruments, 2 study psychologists supervised the biobehavioral data collection and trained the psychometrists. The detailed methods used in these 3 studies and the Axis II validity results for the published RDC/TMD protocol are presented in the fourth paper in this series.(23) Future papers will report on the other self-report measures, particularly as they relate to potentially expanding the domains for the RDC/TMD Axis II assessment.
Over the 3 study sites, a total of 1244 potential participants were screened. Of the 512 potential participants who did not enter the study, 373 were not eligible for the following reasons: current use of excluded medications or recreational drugs (79), failure to meet selection criteria at the time when selective recruitment was initiated in order to fulfill diagnostic recruitment goals of 100 of each TMD subgroup diagnosis (64), failure to meet the initial screening criteria (7 questions) for potential cases or controls (63), excluded medical conditions (40), inability to undergo MRI due to body metal (23), non-TMD orofacial pain disorder (21), dentures (18), ongoing litigation for jaw condition (14), ongoing TMD or dental treatments (12), ineligible age (10), medical history exclusion including TMJ surgery (8), trauma to jaw in last 2 month (8), pregnancy (7), and language barrier (6). One hundred and thirty-nine potential participants were eligible but did not enter the study with the primary reasons being no time or time conflict (48), they changed their mind (35), they did not present for a scheduled visit (28), they did not want to have imaging done including claustrophobia (28). A total of 732 participants were enrolled and 724 completed the study, with 8 drop-outs or incomplete assessments (Figure 1). Of these 724 participants, there was insufficient evidence to classify 5 of them as either case or control and they were excluded from the analysis. The remaining 719 participants included 628 TMD cases and 91 controls. Fourteen of these 628 cases were subsequently excluded from the Axis I analyses due to the presence of chondromatosis (n = 2), reported fibromyalgia (n = 9), or reported rheumatoid arthritis (n = 3). (Participants with a documented medical diagnosis of fibromylagia or rheumatoid arthritis were eligible for the study.) Chondromatosis was excluded based on suspicion of the presence of the disorder as detected on MRI by the radiologist. Thus, a total of 614 cases remained for the Axis I analysis; these participants presented with a total 2,202 diagnoses, or an average of 3.59 diagnoses per case (Table 4). The Axis II analyses included all 628 cases, excluding only those with insufficient evidence to be classified as case or control. The 91 controls had no signs of TMD and had a negative current history, exam, and imaging (MRI, CT, and panoramic radiograph) findings. Of these 91 controls, 80 had no lifetime history of TMD symptoms (i.e., “supercontrols”) and 11 of the controls had no current history (within the past 6 months), but had a prior history of symptoms consistent with TMDs (see inclusion criteria in Table 1). Of the 614 TMD cases used for the Axis I analyses, 24% were direct referrals from local health care providers to the university-based TMD clinics at the 3 sites (clinic cases), and 76% were respondents to study flyers and advertisements (community cases). Figure 2 is a Venn diagram presenting the distribution of cases with Group I Muscle Disorders, Group II Disc Displacements and Group III Arthralgia, Arthritis, Arthrosis, based on the CE consensus diagnoses.
Table 5 summarizes the participant demographic variables including gender, age, race, education level, and income, and the Axis II clinical characteristics including characteristic pain intensity, duration of pain, depression, nonspecific physical symptoms, pain-related disability, and number of RDC/TMD diagnoses.
Only one adverse event occurred, when a participant’s jaw locked closed during the examination. This condition was addressed at the time of the event. The participant was advised to return if this symptom reoccurred and she did not return.
Intersite interexaminer reliability (n = 26) for the criterion exam was excellent (k = 0.81 to 0.91) for 7 of the 8 RDC diagnoses; for osteoarthrosis (IIIc), reliability was good (k = 0.59). The percent agreement ranged from 88–97%, with an average percent agreement of 93.5 and an absolute error of less than 7% among the 3 criterion examiners (Table 6). Absolute error, or percent disagreement, is the complement of percent agreement (PA), that is, 100% – PA.
The overall criterion examination agreement by the 3 examiners with the consensus diagnosis was excellent, with a range in kappa from 0.82 to 0.94, except for the diagnosis of osteoarthrosis (k = 0.53) (Table 6). Given a sample size of just 26 participants, the study sample prevalence for osteoarthrosis was very low at 14%. The absolute error associated with a single exam is estimated as the average error for the 3 examiners relative to the consensus diagnoses, and was observed to be less than 6%. These data indicate that the findings of a single criterion exam agreed with the consensus rendering more than 94% of the time (Table 6).
Intrasite agreement between the second criterion exam and the consensus (n = 724) was very high, with a range of k from 0.95 to 0.98. Percent agreement was 98–99%, with an average of 98.9 and an absolute error at less than 2% (Table 6).
Results reported here are overall agreement computed over the 4 different calibrations that were done during the study. The radiologists’ interrater reliability for reading the CT-depicted hard tissues (osteoarthritis/osteoarthrosis) and MRI-depicted soft tissue (disc position) was good to excellent (k = 0.71 and 0.84, respectively), and is reported separately. (43)
For the published RDC/TMD History Questionnaire, (1) the test-retest reliability for Questions #3, #14a, and #14b was excellent (k = 0.84, 0.76, and 0.75, respectively).
To improve reporting and comparisons between studies, we used standardized methodology for assessing diagnostic accuracy in conformance with STARD recommendations.(26) Testing diagnostic accuracy requires a credible reference standard to assess criterion validity. The credibility of the criterion examination protocol derives initially from the fact that it parallels what is done for comprehensive exams in clinical practice. It also has content validity because experts in the field using the current knowledge base developed it.
The results in Table 6 provide further support for the credibility of the criterion examination protocol. It is associated with high interexaminer agreement for the criterion exam (k = 0.59 to 0.91) and high agreement when the individual criterion diagnoses are compared with the reference standard for Axis I TMD clinical diagnoses (k = 0.53 to 0.94). To our knowledge, there is no comparison in the TMD literature between a criterion examiner and a reference standard. Two kappas that were less than 0.75 (the level considered to be excellent agreement) were associated with osteoarthrosis, for which the sample prevalence was just 14%. It has previously been shown that the magnitude of the reliability coefficients depends on the prevalence of the disorder. (58,59) The reliability of the radiologists’ interpretation of the images at each site was assessed four different times over the course of this project and, overall, was shown to be good to excellent for CT (hard tissue) and MRI (soft tissue), respectively (k = 0.71 to 0.84, respectively). A detailed description of the results of these reliability studies is reported separately. (43)
The reference standard for pain used in the present project was built on what is known about TMD, in addition to paralleling what is done to diagnose other chronic pain problems. The diagnosis of arthralgia and myofascial pain included both the original test items (provocation tests) specified in the RDC/TMD as well as additional test items. These latter tests, vetted by the project’s AP, are tests currently used in research and clinical practice. (31–38)If any of the provocation tests elicited a complaint of pain from the participant, the participant was requested to report whether the pain was familiar, that is, similar to or like the pain they experienced from the target condition. This methodology has been used successfully to establish reference standards for assessment of pain in other medical classification schemes (60–68) The requirement of familiar pain endorsement helps to minimize false positive diagnoses for cases where the pain endorsement is more the result of the provocation test than related to a true pain disorder. It is well understood that provocation tests can provoke pain in controls as well as not previously experienced pain in cases. Finally, the use of 2 independent criterion examiners for establishing the reference standard parallels what has been done to develop diagnostic criteria for fibromyalgia. (69) The reference standard used for fibromyalgia, a musculoskeletal disorder, was a consensus diagnosis between 2 rheumatologists who independently assessed each participant with all available clinical data including a semistructured history and exam.
Establishing the reference standard for assessing the presence of intra-articular disorders is less complex than for that of pain, given the availability of sophisticated, noninvasive imaging techniques that do not alter the structure being examined. For assessment of soft and hard tissue intra-articular anatomy, MRI and CT, respectively, are standard clinical imaging techniques. The images in this project were obtained using protocols standardized between sites with multiple views of the participant’s TMJ for both MRI and CT. All images were also reviewed by both CEs. If there were a question with regard to the radiologist’s findings, the 2 CEs and the radiologist reviewed the images together, with the radiologist rendering the final decision with regard to the interpretation of the images. This methodology was designed to minimize diagnostic misclassification.
The study was designed to include a diverse participant population with a full spectrum and severity of TMD signs and symptoms, and Axis II characteristics that were consistent with literature reports of population-based, (70–75) and clinical studies. (5,76–84) In addition, controls were recruited with no lifetime history of TMD symptoms, or with a prior history of TMD symptoms dating 6 months or more before their examination, but with no current symptoms. This recruitment strategy allowed again for a spectrum of participants ranging from “supercontrols” with no lifetime history of TMD to controls with some past history of TMD-like pain. In the absence of well-defined criteria for normalcy in terms of TMD conditions, this approach for defining TMD controls is consistent with literature reports that used the absence of any RDC/TMD diagnosis (1) or the absence of any signs and symptoms included in the Helkimo Indices (85) to define a control. (86)
For three reasons, we believe sampling bias that could affect the study’s estimates of diagnostic accuracy is minimal. First, sensitivity and specificity estimates are theoretically independent of prevalence of the target conditions. (87) Second, the cases and the controls covered the spectrum of signs and symptoms observed with the presence or absence of TMD conditions. Third, sensitivity and specificity for diagnosing TMD pain or intra-articular disorders would not likely vary significantly based on the past history of the disorder, presence of co-morbid conditions or other exclusion criteria. We also believe that the study sample of target conditions is likely to be representative of participants to whom the test will be applied in future research and clinical settings, which is the fundamental requirement of studies investigating diagnostic test accuracy. (88) This study was, however, limited to study population specifications recommended by STARD (89) as a first step for the validity testing of a diagnostic instrument and, as such, was not designed to provide sensitivity and specificity estimates in patients with co-morbid conditions or other exclusions specified for this study.
A critical issue in establishing a reference standard is to identify and address any potential for circularity. Circularity occurs when cases and controls are intentionally selected based on characteristics that the test protocol is specifically designed to detect. It also occurs if the reference standard too closely resembles the test protocol. If either of these conditions exists, the estimate of validity will be spuriously inflated. These issues were addressed in the present project by (1) inclusion of participants as cases that would not meet criteria for an RDC/TMD diagnosis; (2) a CE assessment protocol that contained all items stipulated by the RDC/TMD with the addition of independent diagnostic tests composed of additional history taking, exam procedures, and imaging including TMJ MRI and CT; (3) independent examination of participants by 2 examiners who then established consensus diagnoses as the reference standards; and (4) the use of an expanded reference standard taxonomy that was independent of the RDC/TMD and included disorders not specified by the RDC/TMD.
The Axis I reference standards for this project could be in error for several reasons due to either the inherent variability in the clinical phenomena, or systematic error in the examiners’ measurements. Pain to palpation of the TMJ capsule is inherently variable, and this measurement is critical for determining a diagnostic subgroup. Systematic error can occur if the examiner knows the participant’s questionnaire responses (58) resulting in a diagnostic suspicion bias that can “influence both the intensity and the outcome of the diagnostic process”.(59) Finally, all provocation tests can potentially result in pain, even in pain-free controls. Thus, there was a clear need to verify the clinical relevance of exam-induced pain by determining if it was familiar to the participant as the pain complaint and could be verified by the two criterion examiners.
Advancement in our understanding of the prevalence, etiologies, natural progression, and treatment of TMD is dependent on having reliable and valid diagnostic criteria. In studies of diagnostic accuracy, a reference standard is required to differentiate cases with the target condition from controls, and to assess the criterion validity of the index test. The primary goal of this paper was to describe in detail the methods used for establishing reference standard diagnoses for assessing the validity of Axis I measures of the RDC/TMD. The Axis I criterion procedures that were developed have content validity and acceptable reliability. It is concluded that this methodology constituted a credible reference standard for assessment of Axis I diagnostic validity, and for revision of the published RDC/TMD Axis I diagnostic scheme. Furthermore, the study participant demographics and clinical characteristics are appropriate for assessing the validity of the RDC/TMD. Finally, for RDC/TMD Axis II biobehavioral instruments, assessment of criterion, convergence, and concurrent validity was performed using previously validated reference standards.
Acknowledgement of Validation Project Study Group
University of Minnesota: Eric L. Schiffman, DDS, MS, Study Principal Investigator; John Look, DDS, PhD, Lead Epidemiologist; Gary Anderson, DDS, MS, Co-Investigator; Mansur Ahmad, DDS, PhD, Radiologist; Quentin Anderson, MD, Radiologist; Lois Kehl, DDS, PhD, Basic Scientist; Wei Pan, PhD, Statistician; Feng Tai, MS, Statistician; Patricia Lenton, RDH, MA, Examiner & Study Coordinator; Amanda Jackson, BA, CCRP, Study Coordinator; Mary Haugan, BA, Data Manager; and Linda Kingman, Administrative Support.
University at Buffalo: Richard Ohrbach, DDS, PhD, Site Principal Investigator & Lead Psychologist; Yoly Gonzalez, DDS, MS, Co-Investigator; Krishnan Kartha, MD, Radiologist; Leslie Garfinkel, RDH, Examiner; Sharon Michalovic, BS, Research Manager and Psychometrist; and Teresa Speers, RN, Study Coordinator.
University of Washington: Edmond L. Truelove, DDS, MSD, Site Principal Investigator; Earl Sommers, DDS, MSD, Co-Investigator; Kimberly Huggins, RDH, BS, Research Manager & Examiner; Lars Hollender, DDS, Odont. Dr., Radiologist; Lloyd Mancl, PhD, Statistician; Jeffrey Sherman, PhD, Psychologist; Kathy Scott, BA, Study Coordinator; Joanne Harman, BA, MA, Study Coordinator, and Julie Sage, BS, Study Coordinator and Psychometrist.
This study was supported by NIH/NIDCR U01-DE013331.
Eric L. Schiffman, University of Minnesota School of Dentistry, Department of Diagnostic and Biological Sciences, 6-320 Moos Tower, 515 Delaware Street SE, Minneapolis, MN 55455, Telephone: 612-625-5146, Fax: 612-626-0138.
Edmond L. Truelove, University of Washington School of Dentistry, Department of Oral Medicine, Box 356370, Seattle, WA 98195.
Richard Ohrbach, University at Buffalo School of Dental Medicine, Department of Oral Diagnostic Sciences, 355 Squire Hall, Buffalo, NY 14214.
Gary C. Anderson, University of Minnesota School of Dentistry, Department of Diagnostic and Biological Sciences, 6-320 Moos Tower 515 Delaware Street SE, Minneapolis, MN 55455.
Mike T. John, University of Minnesota School of Dentistry/School of Public Health, Department of Diagnostic and Biological Sciences, 6-320 Moos Tower, 515 Delaware Street SE, Minneapolis, MN 55455.
Thomas List, Department of Stomatognathic Physiology, Faculty of Odontology, Malmö University, SE 205 06 Malmö, Sweden.
John O. Look, University of Minnesota School of Dentistry, Department of Diagnostic and Biological Sciences, 6-320 Moos Tower, 515 Delaware Street SE, Minneapolis, MN 55455.
University at Buffalo, South Campus
Buffalo, NY 14215