INDEX
Explanations
references to specific public figures and political context
New Auto-Interp
Negative Logits
abbit
-0.18
Alonso
-0.17
Alan
-0.17
-al
-0.17
alg
-0.17
alm
-0.17
Alabama
-0.17
abb
-0.16
aal
-0.16
Alban
-0.16
POSITIVE LOGITS
cheid
0.15
B
0.14
.Bad
0.14
Ch
0.14
auss
0.14
BCH
0.14
!***
0.14
áºŃu
0.14
Ðij
0.14
ÂłB
0.14
Activations Density 0.061%