INDEX
Explanations
references to political positions and titles, particularly those involving "prime."
New Auto-Interp
Negative Logits
ëŀ
-0.16
neau
-0.16
weg
-0.15
(es
-0.15
aines
-0.14
enate
-0.14
ÃŃsticas
-0.13
anken
-0.13
Cort
-0.13
ieu
-0.13
POSITIVE LOGITS
mover
0.15
ayer
0.15
ps
0.15
è»
0.15
820
0.14
ادا
0.14
ordial
0.14
erea
0.14
ayers
0.13
voices
0.13
Activations Density 0.032%