Join the 200th Anniversary Celebration

Special Article

Performance of Four Computer-Based Diagnostic Systems

Eta S. Berner, George D. Webster, Alwyn A. Shugerman, James R. Jackson, James Algina, Alfred L. Baker, Eugene V. Ball, C. Glenn Cobbs, Vincent W. Dennis, Eugene P. Frenkel, Leonard D. Hudson, Elliott L. Mancall, Charles E. Rackley, and O. David Taunton

N Engl J Med 1994; 330:1792-1796June 23, 1994

Abstract

Background

Computer-based diagnostic systems are available commercially, but there has been limited evaluation of their performance. We assessed the diagnostic capabilities of four internal medicine diagnostic systems: Dxplain, Iliad, Meditel, and QMR.

Methods

Ten expert clinicians created a set of 105 diagnostically challenging clinical case summaries involving actual patients. Clinical data were entered into each program with the vocabulary provided by the program's developer. Each of the systems produced a ranked list of possible diagnoses for each patient, as did the group of experts. We calculated scores on several performance measures for each computer program.

Results

No single computer program scored better than the others on all performance measures. Among all cases and all programs, the proportion of correct diagnoses ranged from 0.52 to 0.71, and the mean proportion of relevant diagnoses ranged from 0.19 to 0.37. On average, less than half the diagnoses on the experts' original list of reasonable diagnoses were suggested by any of the programs. However, each program suggested an average of approximately two additional diagnoses per case that the experts found relevant but had not originally considered.

Conclusions

The results provide a profile of the strengths and limitations of these computer programs. The programs should be used by physicians who can identify and use the relevant information and ignore the irrelevant information that can be produced.

Media in This Article

Figure 1Proportion of Cases with a Correct Diagnosis in the Computer, According to the Cutoff Point Establishing the Numbers of Diagnoses Listed.
Table 1Performance Scores of the Computer-Based Diagnostic Systems.
Article

Over the past 20 years, computer-based systems designed to support clinical decision making have evolved from prototypes to commercially available systems1-10. Although many of these systems address narrow areas of subject matter, such as electrolyte and acid-base disorders,2 diagnostic computer-based systems intended to address the entire field of internal medicine have gained increasing visibility11-16. Although most of these systems are generally designed to provide efficient access to medical information, they also include mechanisms for the assessment of clinical and laboratory data and the provision of diagnostic advice. As such systems become more widespread, evaluation of their diagnostic accuracy and usefulness to physicians is necessary. Studies of accuracy whose results have been reported have generally involved individual programs, a limited number and type of cases, and varying criteria and measures of performance15,17-32.

This study evaluated the ability of four programs -- Dxplain (PC version 4.5),33 Iliad (version 4.0),34 Meditel (version 2.0),35 and QMR (version 2.03)36 -- to suggest appropriate diagnoses to account for a set of clinical data. We used the same diagnostically challenging cases for each of the four systems and developed a number of measures of performance. We incorporated principles used in the development of specialty-board certification examinations to provide reliable estimates of performance -- namely, a prospectively determined set of test specifications and an adequate number of cases, with an appropriate range of content and difficulty.

The four programs we studied have all been the subject of published research on their development, evaluation, and application11,12,14,15,19-32,37-43. Although they all incorporate expert judgment, they differ in the data used to determine their probability estimates, the extent to which diseases and related clinical data are addressed in their knowledge bases, the particular vocabulary they require to describe clinical data, and the algorithms they use to combine and analyze data. Iliad34 and Meditel35 use Bayesian logic, but they differ in the assignment of prior probabilities, in specific decision rules, and in the use of expert judgment. Dxplain33 and QMR36 use non-Bayesian algorithms, but they incorporate semiquantitative scales to express the probabilistic association of findings (signs and symptoms) with particular diagnoses, and they use these scales to derive a weighted assessment of the patients' combined signs and symptoms. After the data are entered, each program produces a list of diagnostic possibilities, ranked in order of likelihood. In general, none of the programs include a time-dependent dimension with regard to the appearance, sequence, or duration of signs and symptoms.

Methods

Construction of the Test

All the cases involved the entire field of general medicine, including neurology. They were selected to present a spectrum of diagnostic difficulty but were all considered to be cases in which a physician might be prompted to seek diagnostic help from a colleague, in that they included atypical presentations, rare diseases, multiple disorders presenting simultaneously, or elements sufficiently complex that the physician would be likely to request a diagnostic consultation. All the cases were based on real patients. Those in which the principal challenge involved a choice among therapeutic options were excluded.

Each member of a group of 10 nationally recognized consultants in the fields of general internal medicine, eight subspecialties of internal medicine, and neurology contributed 15 detailed clinical summaries describing patients who had been referred for diagnostic consultation. The summaries included data (history, findings of physical examination, and results of laboratory tests) that were available at the time of the initial consultation and that indicated both normal and abnormal conditions. We omitted data collected subsequently at the consultant's direction; these usually included the definitive test that confirmed the diagnosis. Because the clinical data pertained to real patients, a few cases included vague descriptions by patients of their symptoms, earlier diagnoses that may not have been accurate, or normal results of laboratory tests forwarded by the referring physician that, when the tests were repeated later, were found to be abnormal. The group of experts arrived at a consensus on the diagnoses that were appropriate to consider in each case. They categorized each case according to the organ system or systems involved, the cause of disease, and the diagnostic difficulty. The experts then reviewed the cases to ensure that the test had an appropriate range of difficulty, that the weight given to the major organ systems was approximately equal, and that there was an appropriate gold standard for the diagnosis designated as correct in each case (i.e., a definitive diagnostic test or finding at autopsy or a consensus of experts when no definitive test could confirm the diagnosis). After this review, 120 of the original 150 cases were selected for further consideration.

Analyses of Cases

We attempted to include all the data in the written case descriptions, not just the especially pertinent ones. To ensure that data entry was optimal, we asked the program developers to indicate how they would enter specific clinical data in their particular programs. Bias in vocabulary selection that might have occurred if the program developers had chosen the vocabulary used in a specific context was avoided by having them express in the language of their program a master list of discrete data, collected from all the cases and listed alphabetically under the general categories of history, physical examination, and laboratory assessment. We then entered the data from each case into each program, using the developers' terms for the clinical data on the master list. Because of the limitations of individual systems, some data could only be approximated in some programs, or could not be entered at all. The data were analyzed by each program, and each produced a list of possible diagnoses for the case, ranked according to likelihood. All the analyses were carried out with versions of the four programs available in 1992.

After the programs had generated lists of diagnoses for a case, the top 20 diagnoses on each list were combined in a master list. Without knowing which program had suggested which diagnosis, the group of experts reviewed the diagnoses on the master lists for appropriateness, attempting to determine whether the programs had suggested any additional diagnoses that were appropriate and whether any cases should be eliminated because of ambiguity other than that associated with the performance of an individual program. One hundred ten cases remained after this validation stage. An additional five cases were deleted from the final test because they contained too few items to be run on some of the programs. One hundred five cases remained, including diagnoses such as giant-cell arteritis, histiocytosis X, ankylosing spondylitis, distal renal tubular acidosis, dissecting aortic aneurysm with infarction of spinal cord, thyroid carcinoma, pneumococcal pneumonia and bacteremia, Hodgkin's disease, gastric ulcer, and pericardial constriction.

We next determined the percentage of the diagnoses arrived at for each case that were included in the knowledge base of each program and calculated five scores to characterize the program's performance. The first two scores were based on the entire list of diagnoses that the program generated. The score for Correct Diagnosis is the proportion of the diagnoses included on the diagnosis list generated by the computer that were correct or closely related to the diagnosis that was considered to be correct. This variable is analogous to the concept of sensitivity. The score for Rank is the average rank of the correct (or closely related) diagnosis as it appears on the computer-generated list. Three other scores were derived by reviewing the first 20 diagnoses listed by each program. Like the score for Correct Diagnosis, the Comprehensiveness score is based on the list of appropriate diagnoses originally developed by the group of experts. The Comprehensiveness score is the average proportion of the appropriate diagnoses agreed on by the experts that is included on a computer-generated list. It reflects the extent to which the computer suggested all the diagnoses that the experts originally thought should be suggested. In some instances, the programs proposed diagnoses that the experts had not originally listed but that in retrospect they agreed were reasonable to consider. These diagnoses were the basis of two more scores. The score for Relevance is the average proportion of computer-generated diagnoses that the experts found reasonable to consider, given the clinical data. These diagnoses included the correct one and others that reflected an appropriate integration of the data. This score is conceptually, but not computationally, related to the notion of specificity. Finally, the score for Additional Diagnoses reflects the average number of additional diagnoses suggested by the computer that the experts considered appropriate after their final review of the cases.

Statistical Analysis

For each program, we calculated means and 95 percent confidence intervals for each score on the basis of primary case diagnoses. These calculations were made for all 105 cases and also for the 63 cases whose correct diagnoses were contained in the knowledge bases of all four computer systems. The scores for Rank were based only on the cases for which the computer suggested the correct diagnosis; as a result, in that analysis the number of cases included varied according to the program.

The overall difference between program means on the performance scores was tested for statistical significance with a multivariate repeated-measures analysis of variance44. In the case of dichotomous case scores, the procedure described by Guthrie was used45. A separate analysis of variance was conducted for each score except the score for Rank, since rankings were not available for all cases. Statistically significant analyses of variance were followed with pairwise comparisons between systems46. As with the overall analysis of variance, the pairwise comparisons were also adjusted for dichotomous case scores, with use of the procedures described by Guthrie45. An alpha level of 0.05 was chosen to indicate statistical significance in all tests.

To study how the score for Correct Diagnosis would change with a more stringent cutoff point for the lists of diagnoses, the scores for Correct Diagnosis were examined at various cutoff points. A two-factor repeated-measures analysis of variance was used to test for a statistically significant interaction between program and cutoff point.

Results

Table 1Table 1Performance Scores of the Computer-Based Diagnostic Systems. shows the proportion of the 105 cases for which the correct diagnosis was included in the knowledge bases of all four computer programs, as well as the scores obtained by each program on each performance variable. For each variable, results are shown both for the total number of cases and for the number of cases with diagnoses included in the knowledge bases of all four programs -- 105 and 63 cases, respectively, except in the case of Rank, for which the number of cases used varied according to the program. The numbers of cases on which the scores for Rank were based are included in a footnote to the table.

Knowledge Base

The proportion of the primary case diagnoses included in the knowledge bases of the individual programs ranged from 0.73 to 0.91. This value was significantly higher for Dxplain than for Iliad and QMR, and it was significantly higher for Meditel than for QMR. Three diagnoses were not included in any of the knowledge bases.

Correct Diagnosis

When all the cases were considered, scores for Correct Diagnosis ranged from 0.52 to 0.71 among the four computer programs. The mean scores for Dxplain and Meditel were significantly higher than the score for QMR. For nine cases, none of the programs included the correct diagnosis.

Using the scores for Correct Diagnosis, Figure 1Figure 1Proportion of Cases with a Correct Diagnosis in the Computer, According to the Cutoff Point Establishing the Numbers of Diagnoses Listed. shows the proportion of cases in which the correct diagnosis was the first diagnosis listed, the proportion in which it was listed as 1 of the top 5 diagnoses, as 1 of the top 10, and so forth. There was a significant interaction between the program and the cutoff point used (chi-square = 70.28, 21 df, P<0.001); QMR had the highest score for Correct Diagnosis when the top 10 diagnoses were studied but the lowest score when the entire list was used. The programs were least distinguishable from one another with regard to Correct Diagnosis when cutoff points of 15 and 20 diagnoses were used.

In the analysis of the 63 cases whose diagnoses were included in all four knowledge bases, the scores for Correct Diagnosis ranged from 0.71 to 0.89. As would be expected, the mean score for each program was higher when the sample studied was limited to cases with diagnoses in the knowledge base of the program. The differences between programs were not statistically significant. Among the 63 cases, there was only 1 for which none of the programs suggested the correct diagnosis.

Rank

Among the cases for which each system generated a correct diagnosis, the mean rank of that diagnosis on the computer-generated list ranged from 6.6 to 13.3. For cases whose diagnoses were contained in all four knowledge bases, the mean rank of the correct diagnosis ranged from 5.4 to 12.0. Because the samples varied in size, the significance of the differences could not be calculated.

Relevance

The mean scores for Relevance ranged from 0.19 to 0.37 when the entire sample was studied. The mean score for QMR was significantly higher than those for the other programs, and the mean score for Iliad significantly lower than those for the other programs. When the 63 cases whose diagnoses were included in all four knowledge bases were studied, the scores ranged from 0.21 to 0.46. The scores for QMR were still significantly higher than those for all the other systems, but the only other significant difference was that the score for Dxplain was significantly higher than that for Iliad.

Comprehensiveness

Among the four programs, the mean scores for Comprehensiveness ranged from 0.25 to 0.38 when all cases were studied and from 0.27 to 0.39 when the 63 cases whose diagnoses were included in all four knowledge bases were studied. In both analyses, the mean scores for Dxplain and Meditel were significantly higher than those for Iliad and QMR.

Additional Diagnoses

Approximately six appropriate diagnoses per case appeared on the lists originally compiled by the experts. When either all 105 cases or the sample of 63 cases whose correct diagnoses were included in the knowledge base were studied, each computer program generated an average of approximately two appropriate diagnoses that had not originally been listed. There were no significant differences among the systems with regard to this variable.

Discussion

In the evaluation of computer-based diagnostic systems, two major issues need to be addressed: accuracy and usefulness. This study addresses only the first issue and is focused only on the ability of a system to generate diagnostic hypotheses from a set of data pertaining to a case. The study involved developing a set of cases that were real and diagnostically challenging. Although the programs are not expressly designed for any particular group of physicians, the experts considered that the use of diagnostic systems would probably be important for physicians presented with difficult clinical problems for which they might seek consultation. A problem might be challenging because it involved atypical findings for a common disease, a rare disease, or the interaction of multiple diseases. The most common seekers of such consultations were considered to be primary care physicians or subspecialists needing assistance outside their area of expertise. The clinical cases were chosen for a representative balance of these types of problems, as well as a balance of problems among organ systems. Patients being referred for diagnostic assistance to a broad spectrum of experts were considered an appropriate source of such cases. Although the resulting cases are likely to represent only a small portion of a generalist's normal case load, they are likely to represent a larger portion than a case sample that is limited to clinicopathological conferences and they may, in fact, reflect a large portion of the cases for which diagnostic help is sought.

The programs all produced moderately long lists of potential diagnoses. The lists included many diagnoses that a knowledgeable physician would regard as not being particularly helpful in explaining the case or guiding further studies. On the other hand, each program suggested some diagnoses, though not highly likely ones, that the experts later agreed were worthy of inclusion in the differential diagnosis.

Although each program performed better or worse than others on some of the performance measures, none performed consistently better or worse on all the measures. In many cases the differences, even when statistically significant, were not large. The relative importance of the measures is likely to depend on the individual user's preferences and needs. One of the greatest differences concerned the proportion of case diagnoses in the knowledge bases of the programs (range, 0.73 to 0.91). This variable may explain some of the differences in the overall scores for Correct Diagnosis and Comprehensiveness. The scores for Rank indicate where the diagnosis that was ultimately found to be correct appeared on the list of computer-generated diagnoses. For an atypical case, the correct diagnosis might appropriately be ranked fairly low if other diagnoses were more likely on the basis of the available data. For this reason, some system developers have emphasized that for the appropriate diagnoses to be included on the list at all is more important than their rank. It should also be remembered that the scores for Comprehensiveness and Additional Diagnoses both depend on the number of diagnoses in the initial expert consensus. Since the experts tried to list all the diagnoses that should be considered, the scores for Comprehensiveness and Additional Diagnoses are likely to be lower than they would be if the list had only included a few of the reasonable diagnoses.

Although the sensitivity and specificity of the programs tested in this highly focused study were not impressive, the programs have additional functions that we did not evaluate. These functions, many of which are interactive, include displaying the signs and symptoms associated with diseases, suggesting potentially relevant laboratory tests, and proposing alternative workup strategies. In addition, these programs provide scores that indicate the relative likelihood of each diagnosis. In this study, only the ranking on the diagnosis lists was used, rather than these likelihood scores.

The increasing popularity of computer-based diagnostic systems suggests that at least some physicians have found them helpful. However, such anecdotal data do not permit a systematic assessment of the clinical contexts in which these programs are most useful or of how they actually perform. Our study arouses concern that important diagnostic considerations may be so obscured by other diagnoses that the value of the program may be significantly decreased, or that it could lead to excessive or costly interventions in inexperienced hands. However, results indicating low sensitivity and specificity do not in themselves show how these systems perform in a clinical setting. Although some clinicians may use one of these programs as described here, most would probably enter selected key findings and use some of the other functions of the system to refine the list of diagnoses. Medically knowledgeable persons would probably not only decide what data to enter, but also distinguish between diagnoses that are worthy of consideration and dismiss many of the poorly integrated diagnoses47. The developers of these systems intend these programs to serve a prompting function, reminding physicians of diagnoses they may not have considered or triggering their thinking about related diagnostic possibilities11,23. Clearly, as others have indicated, the next step in the evaluation of these programs will have to include examining the performance of the physician and the computer together48-50.

Supported by a grant (LM05125) from the National Library of Medicine.

We are indebted to Faith Fitzgerald, M.D., for her contributions to the deliberations of the group of experts and her insightful comments on an earlier draft; to G. Octo Barnett, M.D., Randolph A. Miller, M.D., Homer Warner, Jr., Herbert S. Waxman, M.D., and William E. Worley, M.D., the developers of the diagnostic decision support systems, and their colleagues, Nuncia Giuse, M.D., Marvin Packer, M.D., and Hong Yu, M.D., for providing data; to Ms. Janice S. Pulliam for her diligent efforts as a research assistant; and to Ms. Mary Sue B. Pruett for her assistance in the preparation of the manuscript.

Source Information

From the University of Alabama at Birmingham (E.S.B., A.A.S., J.R.J., E.V.B., C.G.C.); InforMed, Inc., St. Davids, Pa. (G.D.W.); the University of Florida, Gainesville (J.A.); the University of Chicago, Chicago (A.L.B.); the Cleveland Clinic Foundation, Cleveland (V.W.D.); the University of Texas, Dallas (E.P.F.); the University of Washington, Seattle (L.D.H.); Hahnemann University, Philadelphia (E.L.M.); Georgetown University, Washington, D.C. (C.E.R.); and Baptist Medical Center Montclair, Birmingham, Ala. (O.D.T.).

This study was conducted by the Office of Educational Development, University of Alabama at Birmingham School of Medicine, 933 19th St. South, Birmingham, AL 35294-2041, where reprint requests should be addressed to Dr. Berner.

References

References

  1. 1

    Barnett GO. The computer and clinical judgment. N Engl J Med 1982;307:493-494
    Full Text | Web of Science | Medline

  2. 2

    Bleich HL. The computer as a consultant. N Engl J Med 1971;284:141-147
    Full Text | Web of Science | Medline

  3. 3

    de Dombal FT. Computer-aided decision support in clinical medicine. Int J Biomed Comput 1989;24:9-16
    CrossRef | Medline

  4. 4

    DeTore AW. Medical informatics: an introduction to computer technology in medicine. Am J Med 1988;85:399-403
    CrossRef | Web of Science | Medline

  5. 5

    Miller RA. Medical diagnostic decision support systems -- past, present, and future. J Am Med Informatics Assoc 1994;1:8-27
    CrossRef | Web of Science | Medline

  6. 6

    Reggia JA, Tuhrim S, eds. Computer-assisted medical decision making. New York: Springer-Verlag, 1985.

  7. 7

    Schwartz WB, Patil RS, Szolovits P. Artificial intelligence in medicine: where do we stand? N Engl J Med 1987;316:685-688
    Full Text | Web of Science | Medline

  8. 8

    Shortliffe EH. Computer programs to support clinical decision making. JAMA 1987;258:61-66
    CrossRef | Web of Science | Medline

  9. 9

    Shortliffe EH. The adolescence of AI in medicine: will the field come of age in the '90s? Artif Intell Med 1993;5:93-106
    CrossRef | Web of Science | Medline

  10. 10

    Shortliffe EH, Perreault LE, eds. Medical informatics: computer applications in healthcare. Reading, Mass.: Addison-Wesley, 1990.

  11. 11

    Barnett GO, Cimino JJ, Hupp JA, Hoffer EP. DXplain: an evolving diagnostic decision-support system. JAMA 1987;258:67-74
    CrossRef | Web of Science | Medline

  12. 12

    Miller R, Masarie FE, Myers JD. Quick Medical Reference (QMR) for diagnostic assistance. MD Comput 1986;3:34-48
    Medline

  13. 13

    Trace D, Evens M, Naeymi-Rad F, Carmony L. Medical information management: the MEDAS approach. In: Miller RA, ed. Proceedings: the Fourteenth Annual Symposium on Computer Applications in Medical Care. New York: IEEE Computer Society Press, 1990:635-9.

  14. 14

    Warner HR Jr. Iliad: moving medical decision-making into new frontiers. Methods Inf Med 1989;28:370-372
    Web of Science | Medline

  15. 15

    Waxman HS, Worley WE. Computer-assisted adult medical diagnosis: subject review and evaluation of a new microcomputer-based system. Medicine (Baltimore) 1990;69:125-136
    Web of Science | Medline

  16. 16

    Weed LL. Knowledge coupling: new premises and new tools for medical care and education. New York: Springer-Verlag, 1991.

  17. 17

    Georgakis DC, Trace DA, Naeymi-Rad F, Evens M. A statistical evaluation of the diagnostic performance of MEDAS -- the Medical Emergency Decision Assistance System. In: Miller RA, ed. Proceedings: the Fourteenth Annual Symposium on Computer Applications in Medical Care. New York: IEEE Computer Society Press, 1990:815-9.

  18. 18

    Nelson SJ, Blois MS, Tuttle MS, et al. Evaluating RECONSIDER: a computer program for diagnostic prompting. J Med Syst 1985;9:379-388
    CrossRef | Medline

  19. 19

    Hammersley JR, Cooney K. Evaluating the utility of available differential diagnosis systems. In: Greenes RA, ed. Proceedings: the Twelfth Annual Symposium on Computer Applications in Medical Care. New York: IEEE Computer Society Press, 1988:229-31.

  20. 20

    Feldman MJ, Barnett GO. An approach to evaluating the accuracy of DXplain. Comput Methods Programs Biomed 1991;35:261-266
    CrossRef | Web of Science | Medline

  21. 21

    Heckerling PS, Elstein AS, Terzian CG, Kushner MS. The effect of incomplete knowledge on the diagnosis of a computer consultant system. Med Inform (Lond) 1991;16:363-370
    CrossRef | Medline

  22. 22

    Lau LM, Warner HR. Performance of a diagnostic system (Iliad) as a tool for quality assurance. Comput Biomed Res 1992;25:314-323
    CrossRef | Medline

  23. 23

    Barness LA, Tunnessen WW Jr, Worley WE, Simmons TL, Ringe TBK Jr. Computer-assisted diagnosis in pediatrics. Am J Dis Child 1974;127:852-858
    Web of Science | Medline

  24. 24

    O'Shea JS. Computer-assisted pediatric diagnosis. Am J Dis Child 1975;129:199-202
    Web of Science | Medline

  25. 25

    Swender PT, Tunnessen WW Jr, Oski FA. Computer-assisted diagnosis. Am J Dis Child 1974;127:859-861
    Web of Science | Medline

  26. 26

    Wexler JR, Swender PT, Tunnessen WW Jr, Oski FA. Impact of a system of computer-assisted diagnosis: initial evaluation of the hospitalized patient. Am J Dis Child 1975;129:203-205
    Web of Science | Medline

  27. 27

    Bankowitz RA, Lave JR, McNeil MA. A method for assessing the impact of a computer-based decision support system on health care outcomes. Methods Inf Med 1992;31:3-10
    Web of Science | Medline

  28. 28

    Bankowitz RA, McNeil MA, Challinor SM, Parker RC, Kapoor WN, Miller RA. A computer-assisted medical diagnostic consultation service: implementation and prospective evaluation of a prototype. Ann Intern Med 1989;110:824-832
    Web of Science | Medline

  29. 29

    Bankowitz RA, McNeil MA, Challinor SM, Miller RA. Effect of a computer-assisted general medicine diagnostic consultation service on housestaff diagnostic strategy. Methods Inf Med 1989;28:352-356
    Web of Science | Medline

  30. 30

    Berman L, Miller RA. Problem area formation as an element of computer aided diagnosis: a comparison of two strategies within Quick Medical Reference (QMR). Methods Inf Med 1991;30:90-95
    Web of Science | Medline

  31. 31

    Middleton B, Shwe MA, Heckerman DE, et al. Probabilistic diagnosis using a reformulation of the INTERNIST-1/QMR knowledge base. II. Evaluation of diagnostic performance. Methods Inf Med 1991;30:256-267
    Web of Science | Medline

  32. 32

    Miller RA, Pople HE Jr, Myers ID. Internist-I, an experimental computer-based diagnostic consultant for general internal medicine. N Engl J Med 1982;307:468-476
    Full Text | Web of Science | Medline

  33. 33

    DXPLAIN. Boston: Massachusetts General Hospital, 1992.

  34. 34

    ILIAD. Salt Lake City: Applied Informatics, 1992.

  35. 35

    MEDITEL: computer assisted diagnosis. Devon, Pa.: Meditel, 1991.

  36. 36

    QMR (Quick medical reference). Pittsburgh: CAMDAT, 1992.

  37. 37

    Bankowitz RA, Blumenfeld BH, Giuse Bettinsoli N, et al. User variability in abstracting and entering printed case histories with QUICK MEDICAL REFERENCE (QMR). In: Stead WW, ed. Proceedings: the Eleventh Annual Symposium on Computer Applications in Medical Care. New York: IEEE Computer Society Press, 1987:68-73.

  38. 38

    Bankowitz RA, Miller JK, Janosky J. A prospective analysis of inter-rater agreement between a physician and a physician's assistant in selecting QMR vocabulary terms. In: Clayton PD, ed. Proceedings: the Fifteenth Annual Symposium on Computer Applications in Medical Care. New York: McGraw-Hill, 1991:609-13.

  39. 39

    First MB, Soffer LJ, Miller RA. QUICK (Quick Index to Caduceus Knowledge): using the INTERNIST-1/CADUCEUS knowledge base as an electronic textbook of medicine. Compute Biomed Res 1985;18:137-165
    CrossRef | Medline

  40. 40

    Giuse DA, Giuse NB, Miller RA. Towards computer-assisted maintenance of medical knowledge bases. Artif Intell Med 1990;2:21-33
    CrossRef

  41. 41

    Masarie FE Jr, Miller RA, Myers JD. INTERNIST-1 properties: representing common sense and good medical practice in a computerized medical knowledge base. Comput Biomed Res 1985;18:458-479
    CrossRef | Medline

  42. 42

    Miller RA, Masarie FE Jr. The Quick Medical Reference (QMR) relationships function: description and evaluation of a simple, efficient “multiple diagnoses” algorithm. In: Lun KC, Degoulet P, Piemme T, Rienhoff O, eds. Medinfo 1992: proceedings of the Seventh World Congress on Medical Informatics. Amsterdam: Elsevier, 1992:512-8.

  43. 43

    Miller RA, McNeil MA, Challinor SM, Masarie FE Jr, Myers JD. The INTERNIST-1/QUICK MEDICAL REFERENCE project -- status report. West J Med 1986;145:816-822
    Medline

  44. 44

    Vonesh EF, Schork MA. Sample sizes in the multivariate analysis of repeated measurements. Biometrics 1986;42:601-610
    CrossRef | Web of Science | Medline

  45. 45

    Guthrie D. Analysis of dichotomous variables in repeated measures experiments. Psychol Bull 1981;90:189-195
    CrossRef | Web of Science

  46. 46

    Shaffer JP. Modified sequentially rejective multiple test procedures. J Am Stat Assoc 1986;81:826-831
    CrossRef | Web of Science

  47. 47

    Rand TG. Medical knowledge bases free the mind for problem solving. ACP Obs 1992;12:10-11

  48. 48

    Salomon G, Perkins DN, Globerson T. Partners in cognition: extending human intelligence with intelligent technologies. Educ Res 1991;20:2-9

  49. 49

    Miller RA, Masarie FE Jr. The demise of the “Greek Oracle” model for medical diagnostic systems. Methods Inf Med 1990;29:1-2
    Web of Science | Medline

  50. 50

    Miller RA. Why the standard view is standard: people, not machines, understand patients' problems. J Med Philos 1990;15:581-591
    Web of Science | Medline

Citing Articles (50)

Citing Articles

  1. 1

    Kai Zheng. 2011. Clinical Decision-Support Systems. , 501-511.
    CrossRef

  2. 2

    Craig A. Umscheid, C. William Hanson. (2011) A Follow-Up Report Card on Computer-Assisted Diagnosis—the Grade: C+. Journal of General Internal Medicine
    CrossRef

  3. 3

    William F. Bond, Linda M. Schwartz, Kevin R. Weaver, Donald Levick, Michael Giuliano, Mark L. Graber. (2011) Differential Diagnosis Generators: an Evaluation of Currently Available Computer Programs. Journal of General Internal Medicine
    CrossRef

  4. 4

    Emily Vardell, Mary Moore. (2011) Isabel, a Clinical Decision Support System. Medical Reference Services Quarterly 30:2, 158-166
    CrossRef

  5. 5

    M. Luisa Durán, Pablo G. Rodríguez, J. Pablo Arias-Nicolás, Jacinto Martín, Carlos Disdier. (2010) A perceptual similarity method by pairwise comparison in a medical image case. Machine Vision and Applications 21:6, 865-877
    CrossRef

  6. 6

    W. H. Drummond. (2009) Neonatal Informatics--Dream of a Paperless NICU: Part Two: Understanding Clinical Expertise. NeoReviews 10:11, e527-e537
    CrossRef

  7. 7

    Eta S. Berner. (2009) Diagnostic error in medicine: introduction. Advances in Health Sciences Education 14:S1, 1-5
    CrossRef

  8. 8

    Arthur S. Elstein. (2009) Thinking about diagnostic thinking: a 30-year perspective. Advances in Health Sciences Education 14:S1, 7-18
    CrossRef

  9. 9

    Randolph A. Miller. (2009) Computer-assisted diagnostic decision support: history, challenges, and possible paths forward. Advances in Health Sciences Education 14:S1, 89-106
    CrossRef

  10. 10

    Wenbin Liang, Colin W Binns, Andy H Lee. (2009) Computerised clinical decision support in rural China. The Lancet 373:9657, 30
    CrossRef

  11. 11

    Mark L. Graber, Ashlei Mathew. (2008) Performance of a Web-Based Clinical Diagnosis Support System for Internists. Journal of General Internal Medicine 23:S1, 37-40
    CrossRef

  12. 12

    U. Joseph Schoepf, Alex C. Schneider, Marco Das, Susan A. Wood, Jugesh I. Cheema, Philip Costello. (2007) Pulmonary Embolism: Computer-aided Detection at Multidetector Row Spiral Computed Tomography. Journal of Thoracic Imaging 22:4, 319-323
    CrossRef

  13. 13

    Hugh Devlin, Joanna K Devlin. (2007) Decision-support systems in patient diagnosis and treatment. Future Rheumatology 2:3, 261-263
    CrossRef

  14. 14

    Tom Burr, Frederick Koster, Rick Picard, Dave Forslund, Doug Wokoun, Ed Joyce, Judith Brillman, Phil Froman, Jack Lee. (2007) Computer-aided diagnosis with potential application to rapid detection of disease outbreaks. Statistics in Medicine 26:8, 1857-1874
    CrossRef

  15. 15

    Sandra M. Richardson, James F. Courtney, John D. Haynes. (2006) Theoretical principles for knowledge management system design: Application to pediatric bipolar disorder. Decision Support Systems 42:3, 1321-1337
    CrossRef

  16. 16

    Pam Miller. (2006) Benefits of On-Line Chat for Single Mothers. Journal of Evidence-Based Social Work 3:3-4, 167-181
    CrossRef

  17. 17

    R. F. Luo, J. G. Bartlett. (2006) Use of the Computer Program GIDEON at an Inpatient Infectious Diseases Consultation Service. Clinical Infectious Diseases 42:1, 157-158
    CrossRef

  18. 18

    Mikio Kimura, Mitsuo Sakamoto, Takuya Adachi, Hiroko Sagara. (2005) Diagnosis of febrile illnesses in returned travelers using the PC software GIDEON. Travel Medicine and Infectious Disease 3:3, 157-160
    CrossRef

  19. 19

    M. G. Weiner, Eric Pifer, Sankey V. Williams. 2005. Computer-Aided Diagnosis. .
    CrossRef

  20. 20

    Beth Kotze, Bilyana Brdaroska. (2004) Clinical decision support systems in psychiatry in the Information Age. Australasian Psychiatry 12:4, 361-364
    CrossRef

  21. 21

    Memoona Hasnain, Hirotaka Onishi, Arthur S Elstein. (2004) Inter-rater agreement in judging errors in diagnostic reasoning. Medical Education 38:6, 609-616
    CrossRef

  22. 22

    Eldon D. Lehmann. (2004) Computerised Decision-Support Tools in Diabetes Care: Hurdles to Implementation. Diabetes Technology & Therapeutics 6:3, 422-429
    CrossRef

  23. 23

    M Seidel, C Breslin, R.M Christley, G Gettinby, S.W.J Reid, C.W Revie. (2003) Comparing diagnoses from expert systems and human experts. Agricultural Systems 76:2, 527-538
    CrossRef

  24. 24

    Astrid M. van Ginneken. (2002) The computerized patient record: balancing effort and benefit. International Journal of Medical Informatics 65:2, 97-119
    CrossRef

  25. 25

    Bonnie Kaplan. (2001) Evaluating informatics applications—clinical decision support systems literature review. International Journal of Medical Informatics 64:1, 15-37
    CrossRef

  26. 26

    Thomas Burkle,, Elske Ammenwerth,, Hans-Ulrich Prokosch, Joachim Dudeck. (2001) Evaluation of clinical information systems. What can be evaluated and what cannot?. Journal of Evaluation in Clinical Practice 7:4, 373-385
    CrossRef

  27. 27

    Alison Round. (2001) Introduction to clinical reasoning. Journal of Evaluation in Clinical Practice 7:2, 109-117
    CrossRef

  28. 28

    Rashbass. (2000) The impact of information technology on histopathology. Histopathology 36:1, 1-7
    CrossRef

  29. 29

    Peter L. M. Kerkhof, Amparo Alonso-Betanzos, Vicente Moret-Bonillo. 1999. Medical Expert Systems. .
    CrossRef

  30. 30

    Robert F. Ritchie. (1998) Expert Systems for the Interpretation of Serum Proteins. Clinical Chemistry and Laboratory Medicine 36:11, 815-823
    CrossRef

  31. 31

    Dale Gephart, Dennis Donahue, Rosemary Orgren, W. Blair Brooks, Stephen L. Priest. (1998) Education Online: A Community Preceptor Computer Network. Teaching and Learning in Medicine 10:4, 232-239
    CrossRef

  32. 32

    Cabot, Richard C.Scully, Robert E., Mark, Eugene J., McNeely, William F., Ebeling, Sally H.Phillips, Lucy D., Wong, John B.Compton, Carolyn C.. (1998) Case 24-1998. New England Journal of Medicine 339:5, 329-337
    Full Text

  33. 33

    Jukkapekka Jousimaa, Ilkka Kunnamo, Marjukka Mäkelä. (1998) Physicians' Patterns of Using a Computerized Collection of Guidelines for Primary Care. International Journal of Technology Assessment in Health Care 14:03, 484
    CrossRef

  34. 34

    Joseph P. McMenamin. (1998) Does products liability litigation threaten picture archiving and communication systems and/or telemedicine?. Journal of Digital Imaging 11:1, 21-32
    CrossRef

  35. 35

    Bharat N Nathwani, Kenneth Clarke, Thomas Lincoln, Costan Berard, Clive Taylor, Kc Ng, Ramesh Patil, Malcolm C Pike, Stanley P Azen. (1997) Evaluation of an expert system on lymph node pathology. Human Pathology 28:9, 1097-1110
    CrossRef

  36. 36

    Jacobus Ridderikhoff, Egbert van Herk. (1997) A diagnostic support system in general practice: Is it feasible?. International Journal of Medical Informatics 45:3, 133-143
    CrossRef

  37. 37

    David A. Grimes, Lee A. Learman. (1996) 11 Theory into practice: within a department. Baillière's Clinical Obstetrics and Gynaecology 10:4, 697-714
    CrossRef

  38. 38

    (1996) Information technology and media in allergy. Allergy 51:9, 603-607
    CrossRef

  39. 39

    Prof. CI. Molina. (1996) Information technology and media in allergy.. Allergy 51:9, 603-607
    CrossRef

  40. 40

    A. Regeniter, W.H. Siede, U.B. Seiffert. (1996) Computer assisted interpretation of laboratory test data with ‘MDI-LabLink’. Clinica Chimica Acta 248:1, 107-118
    CrossRef

  41. 41

    F Lancaster. (1996) Knowledge-Based Systems for General Reference Work: Applications, Problems, and Progress J. V. RICHARDSON, JR. Academic Press, San Diego (1995). xx + 355 pp. ISBN 0-12-588460-5. Information Processing & Management 32:2, 255-256
    CrossRef

  42. 42

    David R. Gifford, Brian S. Mittman, Barbara G. Vickrey. (1996) DIAGNOSTIC REASONING IN NEUROLOGY. Neurologic Clinics 14:1, 223-238
    CrossRef

  43. 43

    Stephen A. Berger, Uri Blackman. (1995) Computer Program for Diagnosing and Teaching Geographic Medicine. Journal of Travel Medicine 2:3, 199-203
    CrossRef

  44. 44

    Marvin E. Gozum, Andrew S. Kanter, Dawn E. DeWitt. (1995) Benefits of computer diagnostic assistants. Journal of General Internal Medicine 10:7, 413-414
    CrossRef

  45. 45

    Vincent Rialle. (1995) Cognition and decision in biomedical artificial intelligence: From symbolic representation to emergence. AI & Society 9:2-3, 138-160
    CrossRef

  46. 46

    E. D. Lehmann, T. Deutsch. (1995) Application of computers in diabetes care-a review. II. Computers for decision support and education. Informatics for Health and Social Care 20:4, 303-329
    CrossRef

  47. 47

    Thorsteinn Njalsson. (1995) On Content of Practice. Scandinavian Journal of Primary Health Care 13:s1, 7-102
    CrossRef

  48. 48

    Dawn E. DeWitt, Andrew S. Kanter. (1994) Are computer diagnostic assistants useful tools for clinicians and educators?. Journal of General Internal Medicine 9:11, 653-654
    CrossRef

  49. 49

    (1994) Computer-Based Diagnostic Systems. New England Journal of Medicine 331:15, 1023-1024
    Full Text

  50. 50

    Kassirer, Jerome P., . (1994) A Report Card on Computer-Assisted Diagnosis -- The Grade: C. New England Journal of Medicine 330:25, 1824-1825
    Full Text

Letters