INDEX
Explanations
references to individuals' names, particularly first initials followed by surnames
New Auto-Interp
Negative Logits
unw
-0.18
areth
-0.18
aise
-0.17
ankan
-0.16
ules
-0.16
esy
-0.16
arty
-0.16
esh
-0.15
arya
-0.15
xz
-0.15
POSITIVE LOGITS
icket
0.22
ourke
0.21
angel
0.21
attr
0.20
undle
0.20
oes
0.20
ober
0.20
ych
0.19
ivas
0.19
aptop
0.19
Activations Density 0.028%