Roel Bogie

Chapter 2

Factors influencing variation in classification among experts A sensitivity analysis where the most influencing expert rater was excluded (statistically identified), showed comparable results ( Table2.1 ). Gwet’s AC1 for the endoscopic Kudo classification was 0.63 (95% CI: 0.57 – 0.70). The IOA for the endoscopic Kudo classification was similar for experts from East (SK, HMC, RS, HY, ST, TM, LCC) compared to those fromWest (AO, AS, MR, RK, TK, RS, SS) ( P =0.520, Table 2.1 ). Overall, IOA on image quality was fair to moderate (Gwet’s AC1: 0.35, 95% CI: 0.28 – 0.42; mean pairwise agreement: 51.8%). After recoding excellent and good quality as one category the IOA was almost perfect (Gwet’s AC1: 0.85, 95% CI: 0.80 – 0.89; mean pairwise agreement: 86.8%). In 34 cases (47.2%) all observers agreed on the sufficient quality for analysis of that case. The influence of the case order was also tested (first 36 vs second 36 cases) showing a Gwet’s AC1 coefficient of 0.61 (95% CI: 0.51 – 0.70) and 0.64 (95% CI: 0.55 – 0.73) respectively. Excluding the 18 cases of SSA/Ps (14 predominantly scored as LST-NG-FE) resulted in a Gwet’s AC1 coefficient of 0.59 (95% CI: 0.51 – 0.67); mean pairwise agreement: 69.0%. We examine the influence of lesion size on the assessment of the LSTs: Gwet’s AC1 coefficients were 0.72 (95% CI: 0.61 – 0.82), 0.49 (95% CI: 0.38 – 0.59) and 0.66 (95% CI: 0.55 – 0.77) for LSTs of 10-19mm, 20-29mm and ≥30mm, respectively.

Table 2.4: Overview of all observed answer pairs, before and after training, between all fellow raters for all cases. | A total of 15,120 pairs of answer were given (21 raters can make 210 unique pairs [21 x 20 x ½] for each of the 72 cases [210 x 72 = 15,120]). Pairs of agreement are marked in grey. For example: before training, a random rater classified a case as LST-G-H while another random rater agreed 1875 times. The situation that a randomly chosen rater classified an LST as LST-G-H while another randomly chosen rater classified the LST as LST-G-NM happened 1764 times before training, 11.7% of all 15,120 observations.

A) Pre-test

LST-G-H

1875 1764 1398

(12.4%)

LST-G-NM LST-NG-FE LST-NG-PD

(11.7%) 2485

(16.4%)

(9.2%) 516 (3.1%) 410

(3.4%) 3070 (2.7%) 1886

(20.3%)

468

(12.5%) 1248

(8.3%)

LST-G-H

LST-G-NM

LST-NG-FE

LST-NG-PD

B) Post-test

LST-G-H

1998 1255

(13.2%)

LST-G-NM LST-NG-FE LST-NG-PD

(8.3%) 2604 (5.1%) 157 (1.8%) 220

(17.2%)

774 275

(1.0%) 4332 (1.5%) 2025

(28.7%)

(13.4%) 1480

(9.8%)

LST-G-H

LST-G-NM

LST-NG-FE

LST-NG-PD

28

Made with FlippingBook - professional solution for displaying marketing and sales documents online