Join the 200th Anniversary Celebration

Correspondence

Variability in the Interpretation of Mammograms

N Engl J Med 1995; 332:1171-1173April 27, 1995

Article

To the Editor:

The findings of Elmore et al. (Dec. 1 issue)1 attest to the subjectivity and gross nature of mammographic findings. Considering that pathologists struggle with an accurate diagnosis even at more than 100 times the magnification of a mammogram, it is highly unlikely that greater accuracy in mammographic diagnosis will ever be achieved with current techniques. Unfortunately, the news media, having previously misled the public by overemphasizing the diagnostic potential of mammograms, are now heightening the apprehension of an already anxious population. The latest hoopla2 will stimulate the call for expensive second and third radiologic opinions and deflect attention from a vital point that is made in the editorial by Kopans.3 Mammography is an effective screening technique but not an accurate diagnostic technique. The essential purpose of a mammogram is only to demonstrate an important abnormality at the earliest possible time. A definitive pathologic diagnosis is to be expected in a very small proportion of cases.

Although the radiologists who participated in the study by Elmore et al. were made aware of the clinical findings, they obviously could not examine the patients but were requested to suggest a plan of management. Readers must not come away from this article with the mistaken impression that an appropriate plan of management can be devised solely on the basis of a mammogram. The mammogram complements the history and physical examination. The physician who is primarily responsible for the care of the patient is the one who makes the essential management decisions.

William Silen, M.D.
Harvard Medical School, Boston, MA 02215

3 References
  1. 1

    Elmore JG, Wells CK, Lee CH, Howard DH, Feinstein AR. Variability in radiologists' interpretations of mammograms. N Engl J Med 1994;331:1493-1499
    Full Text | Web of Science | Medline

  2. 2

    Foreman J. Interpreting mammograms can be difficult, study finds. Boston Globe. December 1, 1994:1.

  3. 3

    Kopans DB. The accuracy of mammographic interpretation. N Engl J Med 1994;331:1521-1522
    Full Text | Web of Science | Medline

To the Editor:

As Paul Newman said in Cool Hand Luke, “What we've got here is a failure to communicate.” Dr. Kopans correctly begins his editorial with the statement, “It should come as no surprise that there is a range of skills and expertise among physicians involved in similar activities.” Of course not, but this is not really the problem. The problem is how best to communicate mammographic findings, since as he says, “It is possible to estimate the probability of cancer, given the abnormalities found by mammography.”

The solution to that problem has already been drafted by a committee of the American College of Radiology, which Dr. Kopans cochaired. A standard lexicon and a standard classification of abnormalities, according to the probability of cancer, along with a standard format for mammographic reports, are part of the solution, which has existed in draft form for some time. For a few years now, I have been anxiously awaiting the publication and implementation of that system.

A lexicon that will allow us to speak the same language, a standard reporting format, and a classification system that will facilitate the assignment of probabilities to specific categories should be adopted by the radiology community and distributed to those of us who use mammographic reports. We would then be in a much better position to advise and consult with our patients about the course of action to be taken. We cannot do much about variability in skills and decision thresholds, but we can communicate more precisely and effectively by using a better system.

Thomas A. Gaskin, M.D.
917 Tuscaloosa Ave. S.W., Birmingham, AL 35211

To the Editor:

The study by Elmore et al. focused on the ranges of sensitivity (the true positive proportion) and specificity (the true negative proportion). We suggest taking another simple step in the analysis that will help determine how to go about reducing variability and standardizing according to the best performance.

One can plot the receiver operating characteristic (ROC), a graph with the sensitivity plotted on the vertical axis and the complement of specificity (i.e., the false positive proportion) plotted on the horizontal axis, as shown in Figure 1Figure 1A Graph of the Receiver Operating Characteristic (ROC), with the True Positive Proportion (Indicating Sensitivity) on the Left Vertical Axis and the False Positive Proportion on the Lower Horizontal Axis..1 Data points representing a given discrimination acuity will cluster about a curve that runs from the lower left corner to the upper right corner as the decision threshold varies from strict (few positive decisions of either kind) to lenient (many positive decisions). Data on recommendations for a biopsy, from Table 3 in the article by Elmore et al., are shown in Figure 1. The 10 data points represent the 10 readers in their study. The curve shown is an approximate best-fitting curve for this group of readers.

Since the variation above and below the curve is small relative to the variation along the curve, we would focus first on adjusting the thresholds of selected readers. There are various ways to select the appropriate threshold, including choosing a desired yield of a biopsy. Through instruction and feedback about the results of their decisions, we could encourage, for example, both the strict and the lenient readers to converge on the moderate readers' thresholds. We could reduce the variation and set the mean about where qualified judges think it should be. The desired threshold would probably be different in screening and referral settings.

The variation among the readers in the present study may not be far from an irreducible statistical minimum, a possibility that may be comforting to the general audience. However, one can follow a procedure to determine quantitatively the features of a mammogram that are most diagnostic and to help readers assess those features (on a checklist with quantitative scales) and then merge the assessments appropriately (perhaps by computer) into an overall estimate of the likelihood of cancer. As Elmore et al. imply, such decision aids have reduced variability and provided general improvements in accuracy.2-4

Carl J. D'Orsi, M.D.
John A. Swets, Ph.D.
BBN Systems and Technology, Cambridge, MA 02138

4 References
  1. 1

    McNeil BJ, Keeler E, Adelstein SJ. Primer on certain elements of medical decision making. N Engl J Med 1975;293:211-215
    Full Text | Web of Science | Medline

  2. 2

    D'Orsi CJ, Getty DJ, Swets JA, Pickett RM, Seltzer SE, McNeil BJ. Reading and decision aids for improved accuracy and standardization of mammographic diagnosis. Radiology 1992;184:619-622
    Web of Science | Medline

  3. 3

    Swets JA, Getty DJ, Pickett RM, D'Orsi CJ, Seltzer SE, McNeil BJ. Enhancing and evaluating diagnostic accuracy. Med Decis Making 1991;11:9-18
    CrossRef | Web of Science | Medline

  4. 4

    Getty DJ, Pickett RM, D'Orsi CJ, Swets JA. Enhanced interpretation of diagnostic images. Invest Radiol 1988;23:240-252
    CrossRef | Web of Science | Medline

To the Editor:

Kopans states, “A positive predictive value of 15 to 25 percent for biopsies of lesions detected by clinical breast examination has long been considered acceptable for intervention . . . [and it is] reasonable for this rate to be considered acceptable for mammography as well.” Along with many others, I believe that the proper positive predictive value for cancer in biopsies of nonpalpable lesions diagnosed on screening mammograms should be 30 to 40 percent.1 This is double the rate that some equally dedicated and informed mammographers would recommend. The average breast cancer takes 7 to 8 years to become detectable by mammography and 8 to 10 years to become palpable. I am willing to wait an additional six months for follow-up of a finding that I believe is probably benign or minimally suspicious. However, with my criteria for biopsy or follow-up — indeed, any criteria — there will be a finite number of cancers with a delayed diagnosis, and for a very small number of women with these cancers, the prognosis will be changed by the delay.

Perhaps some of the differences in interpretations and recommendations documented by Elmore et al. should be viewed as healthy and desirable rather than as a cause for concern and a reason for a push toward conformity.

Ferris M. Hall, M.D
Beth Israel Hospital, Boston, MA 02215

1 References
  1. 1

    Hall FM, Storella JM, Silverstone DZ, Wyshak G. Nonpalpable breast lesions: recommendations for biopsy on suspicion of carcinoma at mammography. Radiology 1988;167:353-358
    Web of Science | Medline

Author/Editor Response

The authors reply:

To the Editor: We agree with Dr. Silen that management should be planned on the basis of all the available clinical information. Nevertheless, although screening mammography is not intended for definitive diagnoses of breast cancer, an abnormal result on screening can sometimes suggest an appropriate plan of management, despite the absence of an indicative clinical history or physical findings.

The use of ROC curves suggested by Drs. D'Orsi and Swets is an interesting way of showing points for 10 observers, but we doubt that it contributes more than the visual display. A reduction in variability will surely require more than quantitative analysis alone.

We think it is “healthy and desirable” (to use Dr. Hall's phrase) for radiologists to discover that they can be inconsistent, but we do not advocate pushing “toward conformity” in situations in which the best approach has not yet been established. We continue to urge radiologists to develop better procedures for reducing their inconsistencies and for making optimal decisions in both diagnosis and management.

Joann G. Elmore, M.D., M.P.H.
Carolyn K. Wells, M.P.H.
Alvan R. Feinstein, M.D.
Yale University School of Medicine, New Haven, CT 06504

Trends: Most Viewed (Last Week)

More Trends