Chest radiographs of 499 patients were analyzed in the present study. The mean age was 65.4 ± 17.0 (median 67.6, range 22-97) years.
Overall, the image quality of the vast majority of X-rays was “good” or “excellent”. Only 1.2% of the X-rays were rated “adequate”. The most frequently cited reason for suboptimal image quality was “overlapping soft tissue”. Image quality details are summarized in Table 1.
Table 1 Image quality rating and reasons for sub-optimal image quality. Ground Truth
A total of 499 x-ray examinations were included in the present study, of which 386 examinations consisted of x-rays in pa and lateral view, 113 examinations consisted of x-rays exclusively in pa view.
In order to determine the basic truth, not only the X-ray images specifically included in this study were evaluated, but also additional examinations were taken into account. In 375 of the 499 included cases, additional X-rays and/or CT scans were available at the time the ground truth was defined.
Regarding additional chest x-rays, 332 patients had at least one additional chest x-ray. In 299 cases, previously made radiographs were available. Control X-rays were available in 136 cases. In 103 cases, both previous and subsequent radiographs were available.
Likewise, 237 patients had at least one CT scan that included the chest. 186 cases had a CT scan taken before the date of the radiograph included in this study. 121 cases had a CT scan taken after the date of the radiograph included in this study. 70 cases had CT scans taken both before and after the date of the radiograph included in this study.
None of the predefined findings were found on 312 of the 499 analyzed (62.5%) radiographs. Accordingly, at least one of the predefined findings was found on 187 X-rays (37.4%); Of these radiographs, the majority had one (n = 99) or two (n = 62) predefined findings. Table 2 shows the distribution of the predefined findings.
Tab. 2 Total number of verified findings on all evaluated X-ray images.
The written report and the AI-Rad analysis came to the same conclusion in 251 cases (50.3%) and to no conclusion in 248 cases (49.7%). In 366 cases (73.3%) the written report agreed with the base truth and in 133 cases (26.7%) the written report did not agree with the base truth. Likewise, the AI wheel agreed with the basic truth in 276 cases (55.3%) and the AI wheel disagreed with the basic truth in 223 cases (44.7%).
lung lesions
The results for the detection of lung lesions are shown in Fig. 1 and Table 3. An example of the detection of a lung lesion is shown in Fig. 2. Considering all CS (CS ≥ 6), the AI wheel offered high sensitivity (0.83) and specificity (0.83) for detecting lung lesions with excellent NPV (0.97) but high FDR (0.62 ). As the CS level increased, the FDR (0.20 at CS=10) decreased significantly. At the same time, however, the sensitivity (0.28 with CS = 10) decreased significantly. NPV remained high (0.91 at CS=10).
illustration 1
Performance metrics for lung lesion detection. (A) Receiver Operating Characteristic (ROC) curves indicating the performance of the AI Rad Companion Chest X-ray (AI) and Written Report (WR) for lung lesion detection (Area under the curve (AUC) AI = 0.867 / WR = 0.750). (B) Sensitivity (Sens.), negative predictive value (NPV), and false detection rate (FDR) of the AI Rad Companion Chest X-ray (AI) and radiologist (WR = written report) for lung lesion detection. The sensitivity of the CI is significantly higher with a (confidence value =) CS ≥ 6; at the same time, the FDR is significantly higher. At CS = 10, the FDR is comparable to the WR, but the sensitivity is significantly lower. The NPV of the AI is high in all CS.
Table 3 Radiologist (WR = written report) and AI Rad Companion Chest X-ray (AI-Rad) performance metrics for lung lesion detection. Figure 2
Example of a lung lesion detected by the AI Rad Companion chest X-ray (A, B) and confirmed by a chest CT scan (C): 74-year-old male patient diagnosed 2 years ago with tongue carcinoma.
The sensitivity of the written report to detect lung lesions was comparatively low (0.52). Both specificity (0.98) and NPV (0.94) were excellent. At the same time, the FDR was comparatively low (0.21).
consolidation
The results for the detection of consolidations are shown in Fig. 3 and Table 4. An example of the verification of consolidations is shown in Fig. 4. The AI wheel provides good sensitivity (0.88) and specificity (0.77) for detecting consolidations when all CS are considered (CS ≥ 6). With increasing CS, sensitivity decreases significantly (0.14 at CS=10), while specificity increases (0.99 at CS=10). The NPV is excellent for all CS (0.98 for CS ≥ 6; 0.91 for CS = 10). The FDR is relatively high (0.70 at CS ≥ 6) when considering all CS, but decreases significantly with increasing CS (0.36 at CS = 10).
figure 3
Performance metrics to identify consolidations. (A) Receiver Operating Characteristic (ROC) curves showing the performance of the AI Rad Companion Chest X-ray (AI) and the radiologist (WR = written report) for the detection of consolidations (area under the curve (AUC) AI = 0.873 /WR = 0.868). (B) Sensitivity (Sens.), negative predictive value (NPV) and false detection rate (FDR) of the AI Rad Companion Chest X-ray (AI) and the radiologist (WR = written report) for the detection of consolidations. The sensitivity of the CI is slightly higher with a (confidence value =) CS ≥ 6; at the same time, the FDR is significantly higher. At CS=10, the FDR is comparable to the WR, but the sensitivity is much lower. The NPV of the AI is high in all CS.
Table 4 Radiologist (WR = written report) and AI Rad Companion Chest X-ray (AI-Rad) performance metrics for detecting consolidations. Figure 4
Example of consolidation detected by AI Rad Companion chest X-ray (A, B) and confirmed by CT scan (C): 72-year-old male patient admitted to hospital with fever, shortness of breath and severe cough.
The sensitivity of the written report to detect consolidations was good (0.78). Both specificity (0.98) and NPV (0.94) were excellent. In addition, the FDR was comparatively low (0.35).
atelectasis
The results of atelectasis detection are shown in Figure 5 and Table 5. The AI wheel offers a moderate sensitivity (0.54 at CS ≥ 6) for the detection of atelectasis, which decreases significantly with increasing CS (0.04 at CS = 10). Both specificity (0.92 if CS ≥ 6) and NPV (0.90 if CS ≥ 6) remain very high for all CS. The FDR is highest when considering all CS (0.40 at CS ≥ 6) and decreases significantly with increasing CS level.
Figure 5
Performance metrics for atelectasis detection. (A) Receiver Operating Characteristic (ROC) curves showing the performance of the AI Rad Companion Chest X-ray (AI) and radiologists (WR = written report) for the detection of atelectasis (area under the curve (AUC) AI = 0.743 /WR = 0.702). (B) Sensitivity (Sens.), negative predictive value (NPV), and false detection rate (FDR) of the AI Rad Companion Chest X-ray (AI) and radiologist (WR = written report) for the detection of atelectasis. The sensitivity of the CI is slightly higher with a (confidence value =) CS ≥ 6; at the same time, the FDR is slightly higher. With CS ≥ 7, the sensitivity and FDR of AI and WR are at a similar level. At CS ≥ 8, both the sensitivity and the FDR decrease significantly. The NPV of the AI is high in all CS.
Table 5 Radiologist (WR = written report) and AI Rad Companion Chest X-ray (AI-Rad) performance metrics for detecting atelectasis.
Likewise, the written report offers a moderate sensitivity (0.43) for the detection of atelectasis. Both the specificity (0.97) and the NPV (0.89) are excellent. The FDR is at a low level (0.24).
pneumothorax
The results for the detection of pneumothoraces are shown in Fig. 6 and Table 6. When analyzing the performance indicators for the detection of pneumothoraces, it should be noted that the prevalence of pneumothoraces in the cohort was significantly low (2.0%), which influences the overall calculation of the performance indicators.
Figure 6
Performance metrics for detecting pneumothoraces. (A) Receiver Operating Characteristic (ROC) curves showing the performance of the AI Rad Companion Chest X-ray (AI) and radiologists (WR = written report) for the detection of pneumothoraces (area under the curve (AUC) AI = 0.830 /WR = 0.848). (B) Sensitivity (Sens.), negative predictive value (NPV) and false detection rate (FDR) of the AI Rad Companion Chest X-ray (AI) and the radiologists (WR = written report) for the detection of pneumothoraces. CS = Trust Score.
Table 6 Radiologist (WR = written report) and AI Rad Companion Chest X-ray (AI-Rad) performance metrics for detecting pneumothoraces.
The AI wheel offers good sensitivity for detecting pneumothoraces considering all CS values (0.70 if CS ≥ 6). However, the sensitivity decreases significantly with increasing CS level (0.30 at CS = 10). Both the specificity and the NPV are excellent for all CS. The FDR is comparatively high at all CS levels (0.70 at CS = 10). Described in absolute numbers; the AI wheel correctly identified 7 out of 10 pneumothoraces. At the same time, the AI wheel showed 23 false positives for pneumothoraces (see also Fig. 7).
Figure 7
Three examples of false positive pneumothoraces (AC) indicated by the AI Rad Companion Chest X-ray.
The written report offers good sensitivity for detecting pneumothoraces (0.70). The specificity (1.0), NPV (0.99) and FDR (0.22) are excellent.
pleural effusion
Pleural effusion detection results are shown in Fig. 8 and Table 7. The AI wheel provides good sensitivity for pleural effusion detection when all CS values are considered (0.74 if CS ≥ 6). However, the sensitivity decreases dramatically with increasing CS value (0.02 at CS=10). Specificity was excellent at all CS levels. The NPV decreased slightly with increasing CS but remained at a very good level (e.g. 0.81 at CS=10). The FDR was very low at all CS values (eg, 0.13 at CS ≥ 6).
Figure 8
Performance metrics for pleural effusion detection. (A) Receiver Operating Characteristic (ROC) curves showing the performance of the AI Rad Companion Chest X-ray (AI) and the radiologist (WR = written report) for the detection of pleural effusion (area under the curve (AUC) AI = 0.861 / WR = 0.910). (B) Sensitivity (Sens.), negative predictive value (NPV), and false detection rate (FDR) of the AI Rad Companion Chest X-ray (AI) and radiologist (WR = written report) for pleural effusion detection. The sensitivity of the AI is inferior to the sensitivity of the WR in all CS (= confidence values).
Table 7 Radiologist (WR = written report) and AI Rad Companion Chest X-ray (AI-Rad) performance metrics for detection of pleural effusions.
The written report provided very good sensitivity (0.88) and excellent specificity (0.94) and NPV (0.97) for the detection of pleural effusions. At the same time, the FDR was low (0.21).