Join the 200th Anniversary Celebration

Special Article

The Methodologic Foundations of Studies of the Appropriateness of Medical Care

Charles E. Phelps

N Engl J Med 1993; 329:1241-1245October 21, 1993

Article

As health care costs continue to increase rapidly, both health care providers and consumers have expressed concern that the additional resources used for health services do not provide commensurate increases in health benefits. Adding fuel to this concern, a number of disquieting studies have estimated the rates of “inappropriate” use in a variety of settings of a variety of procedures such as coronary angiography, carotid endarterectomy, endoscopy, and coronary-artery bypass graft surgery1-4. The estimated rates of inappropriate treatment have ranged from about 15 to 30 percent, reaching as high as 40 percent for particular procedures at individual institutions. Recent studies estimated a rate of 16 percent for inappropriate hysterectomy in seven health maintenance organizations,5 a rate of 24 percent for inappropriate days spent in a Canadian children's hospital,6 and a rate of 23 percent for inappropriate hospitalizations for measles7. The only results that run counter to these high rates have come from recent studies of coronary-artery bypass graft surgery (2 percent inappropriate use),8 percutaneous transluminal coronary angioplasty (4 percent),9 and angioplasty (4 percent)10 in New York. One study concluded that “a substantial fraction of hospitalization is potentially avoidable”11. A leader in studies of medical appropriateness has stated, “If one could extrapolate from the available literature, then perhaps one fourth of hospital days, one fourth of procedures, and two fifths of medications could be done without”12. If this is true, then the country's annual health care bill could be cut by perhaps $100 billion without harm to the public.

Within this overall picture, several puzzles have emerged. First, the estimated rates of inappropriate use provide little explanation for widespread differences between geographic regions in the rates of use of specific treatments2,3. Furthermore, in the Rand Health Insurance Study, the rates of inappropriate treatment did not vary among insurance plans, despite wide differences in both the generosity of the insurance and the actual amounts of health care used by subjects covered by the various plans11. One possible explanation is that the process of sorting candidates for medical and surgical interventions does not work well, but the results presented here suggest that another factor -- flawed estimates of the rates of inappropriate treatment -- may account for these findings.

Ratings of the appropriateness of medical interventions have been used to support practice guidelines12,13 and have been suggested for preoperative screening and even for studies of rates of inappropriate use of interventions by individual physicians12. Researchers using indicators of appropriate care have investigated how such indicators are created, but there has been little analysis of the fundamental characteristics of the methods. With this paper I hope to open a discussion of these methods, with the aim of improving them, sharpening their application, and stimulating further research to resolve some of the issues raised here.

Methods

The particular methods used in appropriateness studies differ according to the researcher, but they share a common approach. First, the researcher defines a medical intervention (e.g., carotid endarterectomy) for analysis and reviews the literature to find sets of clinical indications -- often numbering in the hundreds -- that have been suggested as sufficient to justify the intervention. This review of the literature, often involving a meta-analysis, also assists the expert panel that comes next in the process. Second, the researcher convenes a panel of experts to rank each of the indications on a scale of appropriateness, with scores commonly ranging from 1 (certainly inappropriate) to 9 (certainly appropriate). Third, the researcher employs a group of investigators, typically nurses, to abstract the records of patients in various institutions who received the intervention, looking for information relevant to the previously defined indications. Finally, the researcher matches each patient's abstracted record to the closest possible indication and assigns to that patient's treatment the appropriateness score associated with the indication. These or similar methods have formed the basis for most published studies of the appropriateness of medical interventions.

The methods used to study appropriateness have a number of intrinsic problems. They only study interventions that have already occurred (hence ignoring issues of inappropriate failure to perform an intervention), they ignore patients' preferences, and at best, they can reflect only the consensus of experts, who may have little clinical science on which to base their judgments. These problems have been addressed elsewhere,13 so this paper focuses on an as yet unexplored issue: Methods used to study appropriateness can lead to biased estimates of the rates of inappropriate treatment that may differ markedly from the true rates.

To analyze the problem of biased estimates, one can regard the methods used to study appropriateness as diagnostic tests that attempt to classify patients treated by community physicians as having been treated appropriately or inappropriately. Like any diagnostic test, these methods can create two types of errors: they can have false positive results (classifying treatments as inappropriate when they were appropriate) at a rate of 1 minus specificity per truly appropriate treatment and false negative results (classifying truly inappropriate treatments as appropriate) at a rate of 1 minus sensitivity per truly inappropriate treatment. The Appendix shows the relation between the estimated and true rates of inappropriate treatment as, first (equation 1),

Estimated rate = true rate × (sensitivity) + (1 - true rate) × (1 -specificity)
and second (equation 2), in terms relative to the true rate,
Estimated rate/true rate = sensitivity + (1 - true rate)/true rate × (1 - specificity).

This process can label a treatment as inappropriate either correctly, when it was inappropriate, or incorrectly, when the community doctors proceeded appropriately but the method mislabeled it as inappropriate. The correct labeling of inappropriate treatment occurs at an overall rate equal to the true rate times the sensitivity. Incorrect labeling occurs at an overall rate of 1 minus the true rate times 1 minus the specificity. Equation 1 shows that the estimated rate of inappropriate treatment combines the correct and incorrect labeling results of both, thus creating the potential for biased estimates.

To compare this with a familiar clinical problem, suppose a physician screened 1000 patients for a disease present in 4 percent of the population, using a test with 95 percent sensitivity and specificity. The test would, on average, correctly identify 38 of the 40 truly sick patients and falsely identify 48 of the 960 healthy patients as sick. The test would produce positive results 86 times out of 1000. In the notation of equation 1, the estimated rate would be 0.086, and in equation 2, the ratio of the estimated to the true rate would be 2.15 (0.086/0.04). In other words, the estimated rate would be double the true rate. As with any diagnostic test, the predictive value of a positive result depends heavily on the underlying rate of occurrence of the event measured by the test. False positives commonly outnumber true positives when the underlying rate of occurrence is low or the false positive rate is very far from zero (or both).

With a perfect diagnostic device (sensitivity = specificity = 1), the estimated rate equals the true rate. We have no reason to believe, however, that the methods used to study appropriateness have perfect accuracy, since few if any diagnostic methods have ever achieved such accuracy, particularly for problems such as those considered in appropriateness studies. The members of the expert panels that create appropriateness ratings for each of the many indications for a particular treatment often disagree (i.e., different ratings of appropriateness are assigned by different panel members). On the few occasions when independent ratings from a number of panels have been applied to similar populations, they reveal some disagreement,14-16 but no “pure” tests exist in the literature. However, these studies strongly suggest that the methods used to study appropriateness cannot always have perfect sensitivity and specificity.

Results

If one plots the estimated rate as a function of the true rate (equation 1), the result is a straight line with the vertical intercept equal to the false positive rate and the slope equal to the true positive rate minus the false positive rate. Figure 1Figure 1Relation between the Estimated and True Rates of Inappropriate Treatment, Assuming a Sensitivity and Specificity of 80 Percent. shows such a graph for a sensitivity and specificity of 80 percent, as well as the diagonal line representing a perfectly accurate test. As the figure demonstrates, when the true rate becomes sufficiently large, the estimated rate falls below it. For true rates below this cutoff point, the estimated rate exceeds the true rate. If sensitivity and specificity are equal, then the crossover point always occurs when the true rate is 0.5, a rate exceeding all estimated rates of inappropriate treatment. As intuition suggests, the methods used to study appropriateness generally understate the true rate only when the false positive rate of the method is very small, the true rate of inappropriate treatment is quite large, or both.

To show the effects of classification errors, Figure 2Figure 2Errors in Estimating Four True Rates of Inappropriate Treatment, Assuming a Sensitivity of 95 Percent. and Figure 3Figure 3Errors in Estimating Four True Rates of Inappropriate Treatment, Assuming a Sensitivity of 80 Percent. (using equation 2) show the ratio of the true rate of inappropriateness to the estimated rate for various underlying true rates ranging from 0.05 to 0.2. Figure 2 shows the results when the methods used to study appropriateness have a high sensitivity (95 percent). In this case, the methods almost always overstate the true rate, and any understatement is trivially small. In Figure 3, the sensitivity is lower (80 percent), as would occur, for example, if the method used to study appropriateness went to great lengths to avoid falsely labeling doctors as having high rates of inappropriate treatment. Both figures show a flat line (the “no bias line”) where the ratio of the estimated to the true rate equals 1, to assist in determining combinations of true rate and specificity that lead to upward and downward biases.

These figures demonstrate a common pattern. First, the estimated rate understates the true rate only when the specificity is quite high, and the degree of understatement, when it occurs, is relatively small. Second, the estimated rate overstates the true rate as the specificity falls., and third, the problem gets worse as the true rate falls; in other words, better actual practice leads to larger relative errors in the estimated rates of inappropriate treatment.

It bears emphasizing that there are no estimates of the misclassification rates of methods used to measure appropriateness. To estimate the accuracy of these methods in the traditional fashion of evaluating diagnostic tests, one would need a gold standard of truth that we cannot know. Fortunately, several methods have been developed to analyze cases such as these. They require the simultaneous application of independent tests (i.e., independent panels on appropriateness) to the same population or (even better) to different populations. These maximum-likelihood methods allow estimation of both the true prevalence rates and the misclassification rates for diagnostic tests, in addition to the interrater reliability rates that such studies commonly provide17,18.

Existing methods also allow the estimation of complete receiver-operating-characteristic curves, which show the various combinations of true positive and false positive rates that occur when one selects different cutoff values for diagnostic tests. These methods,19 like those previously discussed,17,18 require the use of more than one diagnostic test when there is no true gold standard. This technique could potentially be used in analyzing methods to study appropriateness, although some commentators urge caution because it relies on a consensus gold standard that may itself be biased20.

It is also worth noting that, once receiver-operating-characteristic curves have been estimated with appropriate methods,19 the arbitrary cutoff points usually chosen to define appropriateness (i.e., 1 to 3, inappropriate; 4 to 6, equivocal; and 7 to 9, appropriate) may potentially be improved by taking account of the costs of false positive and false negative errors and the underlying frequency of inappropriate treatment. The methods for this are well known21,22 but have not yet been applied to the study of appropriateness. They would suggest, for example, classifying any intervention with a score lower than, say, 5 (rather than the customary 3) as inappropriate if the errors of false negative classifications were relatively large. Similarly, this approach suggests that the cutoff point be shifted in the other direction if the costs of false positive mistakes are relatively large (labeling treatments as inappropriate only if the scores are, say, 2 or lower). Although such an approach will not improve the accuracy of methods used to classify appropriateness, it will reduce the costs of misclassification errors.

A different approach would assess changes in the health of patients classified as appropriately or inappropriately treated. Both “healthy” and “incurable” patients treated inappropriately should have less improvement in health than those treated appropriately. Thus, studies of the changes in health status of patients categorized as inappropriately or appropriately treated should illuminate the validity of the process. If correctly classified, inappropriately treated patients should have no improvement in health, whereas appropriately treated patients should, at least on average, have some improvement.

Discussion

The overall value and credibility of methods to assess the appropriateness of medical interventions cannot be determined until studies estimate the sensitivity and specificity of this “diagnostic test.” The nature of diagnostic tests makes the chances of biased estimates quite high. The bias can occur in either direction, but the nature of the problem suggests that the magnitude of upward bias will be more severe if it occurs, and it is perhaps more likely to occur than downward bias. Only studies allowing estimates of the sensitivity and specificity of these methods can illuminate the direction and magnitude of any biases. There are currently no estimates of the misclassification rates of these methods, so any consideration of the consequences of using the appropriateness method must remain speculative.

If the methods used to study appropriateness do suffer from the problems identified here, that would offer one explanation for the lack of correlation between estimated rates of inappropriate treatment and overall rates of treatment identified in the literature2,3 and across experimental plans in the Rand Health Insurance Study11. Returning to equation 1, if true rates of inappropriateness are low, then estimated rates can be dominated by even small rates of false positive results, and hence may show little correlation with actual treatment rates, as these studies found.

The results from New York8-10 are the only anomaly in the general finding of high rates of inappropriate medical intervention1-7. Several issues bear mention. First, the authors of those studies noted that the regulatory environment in New York may lead to important differences in rates of inappropriate care8-10.

Second, these results highlight the possible vulnerability of ratings to apparently small decisions. For example, the treatment of 33 patients with unexplained cardiomegaly or congestive heart failure who underwent angiography was classified as uncertain in the 1990 ratings, but would have been declared inappropriate under previous criteria10. This single modification in the criteria shifted the rate of inappropriate care from 6.5 to 4 percent, showing that apparently innocuous decisions can alter estimated rates substantially.

Similarly, the results for percutaneous transluminal coronary angioplasty reveal the potential importance of a panel's composition; 38 percent of patients underwent treatment of uncertain appropriateness, “[mostly] because the median panel rating was within the uncertain range (i.e., between 4-6)”9. In the parallel New York angioplasty study, 20 percent of the cases were classified as uncertain. In such settings, the shift of a single panel member's ratings from, say, 4 to 3 can readily alter estimated rates of appropriateness by shifting the median rating for specific indications from uncertain to inappropriate.

Relatively frequent use of the “uncertain” category may also lower the sensitivity of the appropriateness method, increasing the chance that the estimated rate contains a downward bias that acts to offset the upward bias arising from the presence of false positives (equation 1). This is most common when the “uncertain” and “appropriate” categories are combined, as is frequently done.

Rates of inappropriate care have most often been estimated for geographic regions, but there is interest in applying the method to smaller units of observation as well. Brook notes that “if appropriateness is to be improved, it will have to be assessed directly at the level of each patient, hospital, and physician”12. Most troublesome for individual physicians would be the problem of poor specificity, which often labels an appropriate intervention as inappropriate. Troublesome for patients and payers would be poor sensitivity, which labels treatments as appropriate when in fact they are inappropriate. Of course, these methods cannot identify patients who could have received a beneficial treatment but did not, potentially a greater source of concern to patients.

Finally, the methods used to study appropriateness merely provide a refined way of recording conventional wisdom about the efficacy of medical therapies, wisdom that often stands without strong scientific support. They cannot substitute for careful analysis of the actual effectiveness of medical treatments. Methods based on reaching a consensus among experts do not create new scientific data, they only codify old beliefs. Greatly increased investments to provide a scientific basis for understanding when various treatments work, and for whom, will provide the best possible information for decision making. Decisions guided by scientific data must be better than those based only on consensus. Major new investments in studies of the effectiveness of medical treatments could perhaps accomplish this goal, and the expected payoff from such studies exceeds the costs of conducting them by several orders of magnitude23,24.

Supported in part by a grant (R01-5477) from the Agency for Health Care Policy and Research.

Source Information

From the Department of Community and Preventive Medicine, University of Rochester School of Medicine and Dentistry, 601 Elmwood Ave., Box 644, Rochester, NY 14642, where reprint requests should be addressed to Dr. Phelps.

Appendix

This Appendix employs the concept of the “true” state of health, unknown to both the community doctors whose practices are evaluated and the expert panel that provides the basis for that evaluation. Relative to this standard, both the community doctors and the expert panel make errors of judgment. In the following equations, A represents the fact that the community doctor has treated the patient, B the fact that the expert panel says the treatment is inappropriate, S the true (gold standard) condition of “sick” (i.e., treatment will benefit the patient), and H the true (gold standard) condition of “healthy” (i.e., treatment will not benefit the patient). The conditional probabilities can be defined as follows: P(B|A) is the estimated rate of inappropriate treatment, P(H|A) is the true rate of inappropriate treatment (so P(S|A) is 1 minus the true rate), P(B|H,A) is the sensitivity of the appropriateness method, and P(B|S,A) is 1 minus the specificity of the appropriateness method. Then, assuming the stochastic independence of A and B,

P(B|A) = P(B,S|A) + P(B,H|A) = [P(S|A) P(B|S,A) + P(H|A) P(B|H,A)] = (1 - true rate) × (1 - specificity) + true rate × sensitivity.

If the events A and B are not independent, then the joint distributions of P(B,S|A) and P(B,H|A) must be used, complicating the expression but generally not altering the basic insight into the problem.

References

References

  1. 1

    Brook RH, Park RE, Chassin MR, Solomon DH, Keesey J, Kosecoff J. Predicting the appropriate use of carotid endarterectomy, upper gastrointestinal endoscopy, and coronary angiography. N Engl J Med 1990;323:1173-1177
    Full Text | Web of Science | Medline

  2. 2

    Chassin MR, Kosecoff J, Park RE, et al. Does inappropriate use explain geographic variations in the use of health care services? A study of three procedures. JAMA 1987;258:2533-2537
    CrossRef | Web of Science | Medline

  3. 3

    Leape LL, Park RE, Solomon DH, Chassin MR, Kosecoff J, Brook RH. Does inappropriate use explain small-area variations in the use of health care services? JAMA 1990;263:669-672
    CrossRef | Web of Science | Medline

  4. 4

    Winslow CM, Kosecoff JB, Chassin M, Kanouse DE, Brook RH. The appropriateness of performing coronary artery bypass surgery. JAMA 1988;260:505-509
    CrossRef | Web of Science | Medline

  5. 5

    Bernstein SJ, McGlynn EA, Siu AL, et al. The appropriateness of hysterectomy: a comparison of care in seven health plans. JAMA 1993;269:2398-2402
    CrossRef | Web of Science | Medline

  6. 6

    Gloor JE, Kissoon N, Joubert GI. Appropriateness of hospitalization in a Canadian pediatric hospital. Pediatrics 1993;91:70-74
    Web of Science | Medline

  7. 7

    Havens PL, Butler JC, Day SE, Mohr BA, Davis JP, Chusid MJ. Treating measles: the appropriateness of admission to a Wisconsin children's hospital. Am J Public Health 1993;83:379-384
    CrossRef | Web of Science | Medline

  8. 8

    Leape LL, Hilborne LH, Park RE, et al. The appropriateness of use of coronary artery bypass graft surgery in New York State. JAMA 1993;269:753-760
    CrossRef | Web of Science | Medline

  9. 9

    Hilborne LH, Leape LL, Bernstein SJ, et al. The appropriateness of use of percutaneous transluminal coronary angioplasty in New York State. JAMA 1993;269:761-765
    CrossRef | Web of Science | Medline

  10. 10

    Bernstein SJ, Hilborne LH, Leape LL, et al. The appropriateness of use of coronary angiography in New York State. JAMA 1993;269:766-769
    CrossRef | Web of Science | Medline

  11. 11

    Siu AL, Sonnenberg FA, Manning WG, et al. Inappropriate use of hospitals in a randomized trial of health insurance plans. N Engl J Med 1986;315:1259-1266
    Full Text | Web of Science | Medline

  12. 12

    Brook RH. Practice guidelines and practicing medicine: are they compatible? JAMA 1989;262:3027-3030
    CrossRef | Web of Science | Medline

  13. 13

    Audet AM, Greenfield S, Field M. Medical practice guidelines: current activities and future directions. Ann Intern Med 1990;113:709-714
    Web of Science | Medline

  14. 14

    Brook RH, Kosecoff JB, Park RE, Chassin MR, Winslow CM, Hampton JR. Diagnosis and treatment of coronary artery disease: comparison of doctors' attitudes in the USA and the UK. Lancet 1988;1:750-753
    CrossRef | Web of Science | Medline

  15. 15

    Merrick NJ, Fink A, Brook RH, et al. Indications for selected medical and surgical procedures: a literature review and ratings of appropriateness: carotid endarterectomy. Santa Monica, Calif.: RAND, 1986. (Report no. R-3204/6.)

  16. 16

    Park RE, Fink A, Brook RH, et al. Physician ratings of appropriate indications for six medical and surgical procedures. Am J Public Health 1986;76:766-772
    CrossRef | Web of Science | Medline

  17. 17

    Hui SL, Walter SD. Estimating the error rates of diagnostic tests. Biometrics 1980;36:167-171
    CrossRef | Web of Science | Medline

  18. 18

    Walter SD, Irwig LM. Estimation of test error rates, disease prevalence and relative risk from misclassified data: a review. J Clin Epidemiol 1988;41:923-937
    CrossRef | Web of Science | Medline

  19. 19

    Henkelman RM, Kay I, Bronskill MJ. Receiver operator characteristic (ROC) analysis without truth. Med Decis Making 1990;10:24-29
    CrossRef | Web of Science | Medline

  20. 20

    Begg CB, Metz CE. Consensus diagnoses and “gold standards.” Med Decis Making 1990;10:29-30[Erratum, Med Decis Making 1990;10:149.]
    CrossRef | Web of Science | Medline

  21. 21

    Swets JA, Pickett RM. Evaluation of diagnostic systems: methods from signal detection theory. New York: Academic Press, 1982.

  22. 22

    Phelps CE, Mushlin AI. Focusing technology assessment using medical decision theory. Med Decis Making 1988;8:279-289
    CrossRef | Web of Science | Medline

  23. 23

    Phelps CE, Parente ST. Priority setting in medical technology and medical practice assessment. Med Care 1990;28:703-723[Erratum, Med Care 1992;30:744-51.]
    CrossRef | Web of Science | Medline

  24. 24

    Phelps CE, Mooney C. Correction and update on “Priority setting in medical technology assessment.” Med Care 1992;30:744-751
    CrossRef | Web of Science | Medline

Citing Articles (60)

Citing Articles

  1. 1

    Partho P. Sengupta, Bijoy K. Khandheria. (2010) A Rising Paradigm of Appropriateness. Journal of the American Society of Echocardiography 23:11, 1205-1206
    CrossRef

  2. 2

    José M. Quintana, Inmaculada Arostegui, Txomin Alberdi, Antonio Escobar, Emilio Perea, Gema Navarro, Belen Elizalde, Elena Andradas. (2010) Decision Trees for Indication of Cataract Surgery Based on Changes in Visual Acuity. Ophthalmology 117:8, 1471-1478.e3
    CrossRef

  3. 3

    Philipp Wagdi, Hatem Alkadhi. (2010) The impact of cardiac CT on the appropriate utilization of catheter coronary angiography. The International Journal of Cardiovascular Imaging 26:3, 333-344
    CrossRef

  4. 4

    Sheila Hafter Gray. (2009) Evidence and Narrative in Contemporary Psychiatry. The Journal of the American Academy of Psychoanalysis and Dynamic Psychiatry 37:3, 415-420
    CrossRef

  5. 5

    Claudia Sanmartin, Kellie Murphy, Nicole Choptain, Barbara Conner-Spady, Lindsay McLaren, Eric Bohm, Michael J. Dunbar, Suren Sanmugasunderam, Carolyn De Coster, John McGurran, Diane L. Lorenzetti, Tom Noseworthy. (2008) Appropriateness of healthcare interventions: Concepts and scoping of the published literature. International Journal of Technology Assessment in Health Care 24:03,
    CrossRef

  6. 6

    Christopher L. Sistrom, Niccie L. McKay. (2008) Evidence-Based Imaging Guidelines and Medicare Payment Policy. Health Services Research 43:3, 1006-1024
    CrossRef

  7. 7

    S. M. Campbell, J. A. Cantrill. (2008) Consensus methods in prescribing research. Journal of Clinical Pharmacy and Therapeutics 26:1, 5
    CrossRef

  8. 8

    Alan J. Forster, Keith O'Rourke, Kaveh G. Shojania, Carl van Walraven. (2007) Combining ratings from multiple physician reviewers helped to overcome the uncertainty associated with adverse event classification. Journal of Clinical Epidemiology 60:9, 892-901
    CrossRef

  9. 9

    Catherine Mercier, Jean-Pierre Boissel, Jacques Estève, Jean Iwaz, Patrice Nony. (2007) New tools to measure discrepancy between prescribing practices and guideline recommendations. Journal of Evaluation in Clinical Practice 13:4, 639-646
    CrossRef

  10. 10

    Yann Mikaeloff, Yola Moride, Babak Khoshnood, Alain Weill, Gérard Bréart. (2007) Infant and toddler disease score was useful for risk of hospitalization based on data from administrative claims. Journal of Clinical Epidemiology 60:7, 680-685
    CrossRef

  11. 11

    Ming Tai-Seale, Rachel Bramson, Xiaoming Bao. (2007) Decision or No Decision: How Do Patient–Physician Interactions End and What Matters?. Journal of General Internal Medicine 22:3, 297-302
    CrossRef

  12. 12

    Nananda F. Col, Christine Duffy, Michele G. Cyr. (2006) The pitfalls of non-evidence-based guidelines. Menopause 13:3, 334-337
    CrossRef

  13. 13

    N. Barber, C. Bradley, C. Barry, F. Stevenson, N. Britten, L. Jenkins. (2005) Measuring the appropriateness of prescribing in primary care: are current measures complete?. Journal of Clinical Pharmacy and Therapeutics 30:6, 533-539
    CrossRef

  14. 14

    Ming Tai-Seale, Rachel Bramson, David Drukker, Margo-Lea Hurwicz, Marcia Ory, Thomas Tai-Seale, Richard Street, Mary Ann Cook. (2005) Understanding Primary Care Physicians?? Propensity to Assess Elderly Patients for Depression Using Interaction and Survey Data. Medical Care 43:12, 1217-1224
    CrossRef

  15. 15

    Jean-Pierre Boissel, Patrice Nony, Emmanuel Amsallem, Catherine Mercier, Jacques Esteve, Michel Cucherat. (2005) How to measure non-consistency of medical practices with available evidence in therapeutics: a methodological framework. Fundamental and Clinical Pharmacology 19:5, 591-596
    CrossRef

  16. 16

    Lisa M. Korst, Kimberly D. Gregory, Michael C. Lu, Carolina Reyes, Calvin J. Hobel, Gilberto F. Chavez. (2005) A Framework for the Development of MaternalQuality of Care Indicators. Maternal and Child Health Journal 9:3, 317-341
    CrossRef

  17. 17

    G. Bersani, A. Rossi, G. Ricci, V. Pollino, G. DeFabritiis, A. Suzzi, V. Alvisi. (2005) Do ASGE guidelines for the appropriate use of colonoscopy enhance the probability of finding relevant pathologies in an open access service?. Digestive and Liver Disease 37:8, 609-614
    CrossRef

  18. 18

    Eelco J. Veen, Maryska L. G. Janssen-Heijnen, Loek P. H. Leenen, Jan A. Roukema. (2005) The Registration of Complications in Surgery: A Learning Curve. World Journal of Surgery 29:3, 402-409
    CrossRef

  19. 19

    K. Lacombe, S. Cariou, P. Tilleul, G. Offenstadt, J. L. Meynard. (2005) Optimizing fluoroquinolone utilization in a public hospital: a prospective study of educational intervention. European Journal of Clinical Microbiology & Infectious Diseases 24:1, 6-11
    CrossRef

  20. 20

    (2004) Factors Influencing to Select Types of U.S. Hospital Network. Korean Journal of Health Policy and Administration 14:2, 1-16
    CrossRef

  21. 21

    (2004) Small Area Variation in Rates of Common Surgery in General Surgery Department. Korean Journal of Health Policy and Administration 14:2, 138-162
    CrossRef

  22. 22

    Curtis E Margo. (2004) Quality care and practice variation: the roles of practice guidelines and public profiles. Survey of Ophthalmology 49:3, 359-371
    CrossRef

  23. 23

    Arnold M. Epstein, Joel S. Weissman, Eric C. Schneider, Constantine Gatsonis, Lucian L. Leape, Robert N. Piana. (2003) Race and Gender Disparities in Rates of Cardiac Revascularization. Medical Care 41:11, 1240-1255
    CrossRef

  24. 24

    Sheila Hafter Gray. (2002) Evidence-Based Psychotherapeutics. Journal of the American Academy of Psychoanalysis and Dynamic Psychiatry 30:1, 3-16
    CrossRef

  25. 25

    Paul G Shekelle, R.E Park, James P Kahan, Lucian L Leape, Caren J Kamberg, Steven J Bernstein. (2001) Sensitivity and specificity of the RAND/UCLA Appropriateness Method to identify the overuse and underuse of coronary revascularization and hysterectomy. Journal of Clinical Epidemiology 54:10, 1004-1010
    CrossRef

  26. 26

    Shekelle, Paul G., . (2001) Are Appropriateness Criteria Ready for Use in Clinical Practice?. New England Journal of Medicine 344:9, 677-678
    Full Text

  27. 27

    Christopher Bunch. (2001) Clinical governance. British Journal of Haematology 112:3, 533-540
    CrossRef

  28. 28

    S. M. Campbell, J. A. Cantrill. (2001) Consensus methods in prescribing research. Journal of Clinical Pharmacy and Therapeutics 26:1, 5-14
    CrossRef

  29. 29

    José M. Quintana, Inmaculada Aróstegui, Jesús Azkarate, J.Ignacio Goenaga, Xabier Elexpe, Jon Letona, Andoni Arcelay. (2000) Evaluation of explicit criteria for total hip joint replacement. Journal of Clinical Epidemiology 53:12, 1200-1208
    CrossRef

  30. 30

    I. A. Scott, P. B. Greenberg, P. A. Phillips. (2000) The value of evidence-based medicine to consultant physicians. Australian and New Zealand Journal of Medicine 30:6, 683-692
    CrossRef

  31. 31

    John-Paul Vader, François Porchet, Tania Larequi-Lauber, Robert W. Dubois, Bernard Burnand. (2000) Appropriateness of Surgery for Sciatica. Spine 25:14, 1831-1836
    CrossRef

  32. 32

    M FIESCHI, R GIORGI, J GOUVERNET, P DEGOULET. (2000) De la connaissance à la pratique clinique : l'introduction de la médecine basée sur le niveau de preuve. La Revue de Médecine Interne 21:1, 105-109
    CrossRef

  33. 33

    Elwyn, Jones, Edwards. (1999) ‘Appropriateness of referral to urologists’: can it be defined for symptoms of benign prostatic obstruction and used as a quality measure?. BJU International 83:3, 238-242
    CrossRef

  34. 34

    M. E. Coren, V. Ng, M. Rubens, M. Rosenthal, A. Bush. (1998) The value of ultrafast computed tomography in the investigation of pediatric chest disease. Pediatric Pulmonology 26:6, 389-395
    CrossRef

  35. 35

    William J. Tremaine. (1998) Issues in the development of practice guidelines for inflammatory bowel disease. Inflammatory Bowel Diseases 3:4, 284-287
    CrossRef

  36. 36

    PARAMJIT S. CHANDHOKE, EDWARD deANTONI. (1998) Cost-Effectiveness Analysis: Application to Endourology. Journal of Endourology 12:6, 485-491
    CrossRef

  37. 37

    Paul G. Shekelle, Mark R. Chassin, R. E. Park. (1998) Assessing the Predictive Validity of the RAND/UCLA Appropriateness Method Criteria for Performing Carotid Endarterectomy. International Journal of Technology Assessment in Health Care 14:04, 707
    CrossRef

  38. 38

    Shekelle, Paul G., Kahan, James P., Bernstein, Steven J., Leape, Lucian L., Kamberg, Caren J., Park, R.E., . (1998) The Reproducibility of a Method to Identify the Overuse and Underuse of Medical Procedures. New England Journal of Medicine 338:26, 1888-1895
    Full Text

  39. 39

    Ayanian, John Z., Landrum, Mary Beth, Normand, Sharon-Lise T., Guadagnoli, Edward, McNeil, Barbara J., . (1998) Rating the Appropriateness of Coronary Angiography — Do Practicing Physicians Agree with an Expert Panel and with Each Other?. New England Journal of Medicine 338:26, 1896-1904
    Full Text

  40. 40

    Dieter Köhler, Gerd Goeckenjan, Jörg Rünz. (1998) Evolutionäre Qualitätssicherung. Medizinische Klinik 93:3, 191-196
    CrossRef

  41. 41

    Jochanan Benbassat, Mark Taragin. (1998) What is adequate health care and how can quality of care be improved?. International Journal of Health Care Quality Assurance 11:2, 58-64
    CrossRef

  42. 42

    Stephen A. Buetow, Bonnie Sibbald, Judith A. Cantrill, Shirley Halliwell. (1997) Appropriateness in health care: Application to prescribing. Social Science & Medicine 45:2, 261-271
    CrossRef

  43. 43

    David L. Witte. (1997) The complex connections between test properties and relevant outcomes: widening the perspective. Clinica Chimica Acta 260:2, 117-129
    CrossRef

  44. 44

    Tania Larequi-Lauber, John-Paul Vader, Bernard Burnand, Robert H. Brook, Jacqueline Kosecoff, Dorith Sloutskis, Heinz Fankhauser, Jean Berney, Nicolas de Tribolet, Fred Paccaud. (1997) Appropriateness of Indications for Surgery of Lumbar Disc Hernia and Spinal Stenosis. Spine 22:2, 203-209
    CrossRef

  45. 45

    Richard S. Eisenstaedt. (1997) Modifying physicians' transfusion practice. Transfusion Medicine Reviews 11:1, 27-37
    CrossRef

  46. 46

    James P. AuBuchon. (1996) The Role of Decision Analysis in Transfusion Medicine. Vox Sanguinis 71:1, 1-5
    CrossRef

  47. 47

    MIRIAM KOMAROMY, NICOLE LURIE, DENNIS OSMOND, KAREN VRANIZAN, DENNIS KEANE, ANDREW B. BINDMAN. (1996) Physician Practice Style and Rates of Hospitalization for Chronic Medical Conditions. Medical Care 34:6, 594-609
    CrossRef

  48. 48

    Andrew Miles, Declan O'Neill, Andreas Polychronis. (1996) Central dimensions of clinical practice evaluation: efficiency, appropriateness and effectiveness - II. Journal of Evaluation in Clinical Practice 2:2, 131-152
    CrossRef

  49. 49

    Thomas H. Lee. (1996) Beyond guidelines. Journal of General Internal Medicine 11:3, 174-175
    CrossRef

  50. 50

    J.H. Kingma. (1995) Waiting for coronary artery bypass surgery: abusive, appropriate, or acceptable?. The Lancet 346:8990, 1570-1571
    CrossRef

  51. 51

    Thomas H. Lee. (1995) How do cardiologists fit in managed care?. Journal of the American College of Cardiology 26:6, 1492-1493
    CrossRef

  52. 52

    Jeremy Wyatt DM MRCP. (1995) Acquisition and use of clinical data for audit and research. Journal of Evaluation in Clinical Practice 1:1, 15-27
    CrossRef

  53. 53

    Robert L. Kane. (1995) Creating Practice Guidelines: The Dangers of Over-Reliance on Expert Judgment. The Journal of Law, Medicine & Ethics 23:1, 62-64
    CrossRef

  54. 54

    Mark V. Pauly. (1995) Practice Guidelines: Can They Save Money? Should They?. The Journal of Law, Medicine & Ethics 23:1, 65-74
    CrossRef

  55. 55

    Kenneth Rockwood. (1995) Integration of Research Methods and Outcome Measures: Comprehensive Care for the Frail Elderly. Canadian Journal on Aging / La Revue canadienne du vieillissement 14:S1, 151-164
    CrossRef

  56. 56

    Jeffrey Braithwaite. (1994) How viable is Victoria's funding policy?. Australian Journal of Public Health 18:4, 355-357
    CrossRef

  57. 57

    S. Mussurakis, A. Sprigg, G.M. Steiner. (1994) The appropriateness of use and the clinical impact of micturating cystourethrography in paediatric practice. Clinical Radiology 49:8, 541-545
    CrossRef

  58. 58

    Sylvan Lee Weinberg. (1994) President's page: Quality, appropriateness and outcomes—How accurate are our measures?. Journal of the American College of Cardiology 23:3, 824-825
    CrossRef

  59. 59

    (1994) Appropriateness Studies. New England Journal of Medicine 330:6, 432-434
    Full Text

  60. 60

    Kassirer, Jerome P., . (1993) The Quality of Care and the Quality of Measuring It. New England Journal of Medicine 329:17, 1263-1265
    Full Text

Letters