INDEX
Explanations
references to political advocacy and opposition
New Auto-Interp
Negative Logits
colony
-0.15
peek
-0.15
Colony
-0.15
ula
-0.14
taraf
-0.14
narc
-0.14
571
-0.14
nom
-0.13
andr
-0.13
apes
-0.13
POSITIVE LOGITS
ITE
0.15
İl
0.14
sville
0.14
aits
0.14
ÑĢай
0.14
ëĭ¨ì²´
0.14
:index
0.13
urum
0.13
losses
0.13
Pitch
0.13
Activations Density 0.224%