INDEX
Explanations
references to political figures and their classifications
New Auto-Interp
Negative Logits
lix
-0.06
iban
-0.06
eza
-0.06
avia
-0.06
bo
-0.06
inger
-0.06
za
-0.06
arya
-0.06
-
-0.06
icum
-0.05
POSITIVE LOGITS
_nsec
0.07
ï¼Ĭ
0.07
ãĥ³ãĥij
0.07
ìĶ
0.07
klu
0.07
owied
0.07
ÅĻej
0.07
haf
0.07
WEEN
0.07
Leban
0.06
Activations Density 0.008%