INDEX
Explanations
social media handles or usernames containing specific keywords or symbols
specific repeated symbols or characters
New Auto-Interp
Negative Logits
userc
-0.76
cryst
-0.75
pex
-0.74
livest
-0.74
destro
-0.69
nut
-0.66
advoc
-0.66
aturdays
-0.66
compr
-0.65
territ
-0.65
POSITIVE LOGITS
————
0.80
Protesters
0.70
Hide
0.67
Hear
0.67
Recap
0.65
Written
0.65
Chapter
0.65
Rohingya
0.64
————————
0.62
Rat
0.62
Activations Density 0.078%