INDEX
Explanations
references to political controversies and criticisms
New Auto-Interp
Negative Logits
otlin
-0.15
pector
-0.14
ekim
-0.14
aris
-0.14
ilde
-0.14
Gone
-0.14
mite
-0.14
ucha
-0.14
ucken
-0.14
asz
-0.13
POSITIVE LOGITS
quarters
0.24
detr
0.24
academics
0.23
critics
0.21
voices
0.21
column
0.21
opponents
0.21
some
0.21
segments
0.20
prominent
0.20
Activations Density 0.460%