INDEX
Explanations
references to political figures and their statements
New Auto-Interp
Negative Logits
ÙĩÙĦ
-0.16
ago
-0.15
scription
-0.15
zee
-0.15
Ðĭ
-0.14
inscription
-0.14
æĭį
-0.14
.Preference
-0.13
ark
-0.13
Voter
-0.13
POSITIVE LOGITS
anner
0.16
opers
0.15
xab
0.15
nav
0.15
uml
0.14
saber
0.14
rea
0.14
Desire
0.14
åĨ
0.13
á»Ļ
0.13
Activations Density 0.077%