INDEX
Explanations
mentions of social media platforms and calls to follow or engage with them
New Auto-Interp
Negative Logits
ToWorld
-0.16
lies
-0.15
ango
-0.15
ANGO
-0.15
','.
-0.14
ácil
-0.14
ORIGINAL
-0.14
Bilim
-0.14
mtx
-0.14
ucha
-0.14
POSITIVE LOGITS
é§
0.17
ALES
0.15
lander
0.15
uben
0.14
Camp
0.14
Butler
0.14
Singleton
0.14
327
0.13
Son
0.13
sole
0.13
Activations Density 0.019%