INDEX
Explanations
references to bribery and corruption
New Auto-Interp
Negative Logits
riel
-0.17
dge
-0.15
enk
-0.14
reu
-0.14
regunta
-0.14
otte
-0.14
otech
-0.14
inel
-0.14
inem
-0.14
ines
-0.13
POSITIVE LOGITS
iod
0.18
ourd
0.17
lsa
0.15
isto
0.14
duk
0.14
še
0.14
untu
0.14
eniable
0.14
ulse
0.13
_TA
0.13
Activations Density 0.087%