13. Statistics | Radiology Review Manual (Dahnert, Radiology Review Manual)

Authors: Dahnert, Wolfgang

Title: Radiology Review Manual, 6th Edition

> Table of Contents > Statistics

Statistics

Terminology

Incidence = number of diseased people per 100,000 population per year
Prevalence = number of existing cases per 100,000 population at a target date
Mortality = number of deaths per 100,000 population per year
Fatality = number of deaths per number of diseased

Decision matrix

GOLD STANDARD
		normal	abnormal	subtotal
T E S T
	normal	TN	FN	T-	NPV
	abnormal	FP	TP	T+	PPV
	subtotal	D-	D+	total	preval
		specificity	sensitivity		acc

TP = test positive in diseased subject
FP = test positive in nondiseased subject
FN = test negative in diseased subject
TN = test negative in nondiseased subject
T+ = abnormal test result
T- = normal test result
D+ = diseased subjects
D- = nondiseased subjects

Sensitivity

= ability to detect disease
= probability of having an abnormal test given disease
= number of correct positive tests / number with disease
= true positive ratio = TP / (TP + FN) = TP / D+
D+ column in decision matrix
Independent of prevalence

Specificity

= ability to identify absence of disease
= probability of having a negative test given no disease
= number of correct negative tests / number without disease
= true negative ratio = TN / (TN + FP) = TN / D-
D- column in decision matrix
Independent of prevalence

Accuracy

= number of correct results in all tests
= number of correct tests / total number of tests
= (TP + TN) / (TP + TN + FP + FN) = (TP + TN) / total
Depends much on the proportion of diseased + nondiseased subjects in studied population
Not valuable for comparison of tests
Example: same test accuracy of 90% for two tests A and B

Positive Predictive Value

= positive test accuracy
= likelihood that a positive test result actually identifies presence of disease
= number of correct positive tests / number of positive tests
= TP / (TP + FP) = TP / T+ Test A: 90% accuracy

Test A: 90% accuracy

GOLD STANDARD
		normal	abnormal	subtotal
T E S T
	normal	90	10	100	90%
	abnormal	10	90	100	90%
	subtotal	100	100	200	50%
		90%	90%		90%

Test B: 90% accuracy

GOLD STANDARD
		normal	abnormal	subtotal
T E S T
	normal	170	20	190	89%
	abnorma	0	10	10	100%
	subtotal	170	30	200	15%
		100%	33%		90%

T+ row in decision matrix
Dependent on prevalence
PPV increases with increasing prevalence for given sensitivity + specificity
PPV increases with increasing specificity for given prevalence

Negative Predictive Value

= negative test accuracy
= likelihood that a negative test result actually identifies absence of disease
= number of correct negative tests / number of negative tests
= TN / (TN + FN) = TN / T-
T- row in decision matrix
Dependent on prevalence
NPV increases with decreasing prevalence for given sensitivity + specificity
NPV increases with increasing sensitivity for given prevalence

False-positive Ratio

= proportion of nondiseased patients with an abnormal test result
D- column in decision matrix
= FP / (FP + TN) = FP / D-
= 1 - specificity = (TN + FP - TN) / (TN + FP)

False-negative Ratio

= proportion of diseased patients with a normal test result
D+ column in decision matrix
= FN / (TP + FN) = FN / D+
= 1 - sensitivity = (TP + FN - TP) / (TP + FN)

Disease Prevalence

= proportion of diseased subjects to total population
= (TP + FN) / (TP + TN + FP + FN) = D+ / total

P.1127

Test C: prevalence of 10%, 90% sensitivity + 90% specificity

GOLD STANDARD
T E S T
			normal	abnormal	subtotal
	normal	162	2	164	99%
	abnormal	18	18	36	50%
subtotal	180	20	200	10%
		90%	90%		90%

Test D: prevalence of 90%, 90% sensitivity + 90% specificity

GOLD STANDARD
T E S T
			normal	abnormal	subtotal
	normal	18	18	36	50%
	abnorma	2	162	164	99%
subtotal	20	180	200	90%
		90%	90%	90%

Sensitivity + specificity are independent of prevalence
Affects predictive values + accuracy of a test result
Example:
Test A, C, D: 90% sensitivity + 90% specificity

Bayes Theorem

= the predictive accuracy of any test outcome that is less than a perfect diagnostic test is in. uenced by
- pretest likelihood of diseaseReceiver Operating Characteristics
- criteria used to de.ne a test result for 3 Different Tests

Receiver Operating Characteristics (ROC)

= degree of discrimination between diseased + nondiseased patients using varying diagnostic criteria instead of a single value for the TP + TN fraction
= curvilinear graph gen er at ed by plotting TP ratio as a function of FP ratio for a number of different diagnostic criteria (ranging from de.nitely normal to definitely abnormal)
Y-axis: true-positive ratio = sensitivity
X-axis: false-positive ratio = 1 - specificity; reversing the values on the X-axis results in an identical sensitivity-specificity curve
Use: variations in diagnostic criteria are reported as a continuum of responses ranging from de. nitely abnormal to equivocal to de.nitely normal due to 0.2 subjectivity + bias of individual radiologist
A minimum of 4-5 data points of diagnostic criteria are needed!
Difficulty: subjective evaluation of image features; subjective diagnostic interpretation; data must be ordinal (= Specificity discrete rating scale from definitely negative to definitely positive)
Interpretation:
- Increase in sensitivity leads to decrease in specificity!
- Increase in specificity leads to decrease in sensitivity!
- The most sensitive point is the point with the highest TP ratio
- - equivalent to overreading by using less stringent diagnostic criteria (all findings read as abnormal)
- The most specific point is the point with the lowest FP ratio
- - equivalent to underreading by using more strict diagnostic criteria (all findings read as normal)
- The ROC curve closest to the Y-axis represents the best diagnostic test
- Does not consider disease prevalence in the population

Receiver Operating Characteristics for 3 Different Tests

Interpretation of Receiver Operating Characteristics

P.1128

Receiver Operating Characteristics for Positive Predictive Value of Various Tests with Different Sensitivities and Specificities

Confidence Limit

= degree of certainty that the proportion calculated from a sample of a particular size lies within a specific range (binomial theorem)
Analogous to the mean 2 SD

Clinical Epidemiology

= application of epidemiologic principles + methods to problems encountered in clinical medicine with the purpose of developing + applying methods of clinical observation that will lead to valid clinical conclusions
Epidemiology = branch of medical science dealing with incidence, distribution, determinants in control of disease within a de. ned population

Screening Techniques

Principle question: can early detection in. uence the natural history of the disease in a positive manner?
Outcome measure: early detection + effective therapy should reduce morbidity + mortality, ie, increase survival rates (observational study)!
Biases:
- Lead time = interval between disease detection at screening + the usual time of clinical manifestation; early diagnosis always appears to improve survival by at least this interval, even when treatment is ineffective
- Length time = differences in growth rates of tumors:
  - slow-growing tumors exist for a long time before manifestation thus enhancing the opportunity for detection
  - fast-growing tumors exist for a short time before manifestation thus providing less opportunity for detection at screening interval cancers = clinically detected between scheduled screening exams are likely fast-growing tumors; patients with tumors detected by means of screening tests will have a better prognosis than those with interval cancers
  Receiver Operating Characteristics for Negative Predictive Value of Various Tests with Different Sensitivities and Specificities
- Self-selection = decision to participate in screening program; usually made by patients better educated + more knowledgeable + more health-conscious; mortality rates from noncancerous causes can be expected to be lower than in general population
- Overdiagnosis = detection of lesions of questionable malignancy, eg, in situ cancers, which might never have been diagnosed without screening + have an excellent prognosis

Randomized Trials

Design: two arms consisting of (a) study group and (b) control group with patients assigned to each arm on randomized basis
Endpoint: difference in mortality rates of both groups
Power: study must be of suf.cient size + duration to detect a difference, if one exists; analogous to sensitivity of a diagnostic test
Impact on effective size of groups:
- Compliance = proportion of women allocated to screening arm of trial who undergo screening
- Contamination = proportion of women allocated to control group of trial who do undergo screening

Case-control Studies

Retrospective inquiry, which is less expensive, takes less time, is easier to perform:
- determine the number of women who died from breast cancer
- chose same number of women of comparable age who have not died from breast cancer
- ascertain the number of women who were screened + who were not screened in both arms
Calculation of odds ratio = ad / bc

Kappa (K)

measures concordance between test results and gold standard
Analogous to Pearson correlation coef.cient (r) for continuous data!

P.1129

Example: = 0.743
Predictive value of :

0.00 - 0.20	little or none
0.20 - 0.40	slight
0.40 - 0.60	group
0.60 - 0.80	some individual
0.80 - 1.00	individual

Case-control Studies

	cases of deaths from breast cancer	controls not died from breast cancer
screened	a	b
not screened	c	d

GOLD STANDARD
T E S T
	18	3	0	0	21
	2	20	5	2	29
	1	4	2	3	28
0	0	5	17	22
21	27	30	22	100