INDEX
Explanations
references to political actions and social issues
New Auto-Interp
Negative Logits
ãģ¿
-0.18
atter
-0.15
unch
-0.14
Doub
-0.14
ideo
-0.13
ely
-0.13
undy
-0.13
ActionTypes
-0.13
.toolbox
-0.13
uffix
-0.13
POSITIVE LOGITS
ierz
0.16
esk
0.14
stin
0.14
aye
0.14
_INCLUDED
0.14
Nic
0.14
ìĽĶë¶ĢíĦ°
0.14
dac
0.14
ang
0.14
arie
0.14
Activations Density 0.236%