INDEX
Explanations
mentions of individuals, particularly names or handles associated with social media or public figures
New Auto-Interp
Negative Logits
ropa
-0.16
site
-0.15
geme
-0.14
Bare
-0.14
MMdd
-0.14
VID
-0.14
erdale
-0.14
Hague
-0.14
SPORT
-0.14
_ajax
-0.13
POSITIVE LOGITS
bidden
0.18
antan
0.16
nesc
0.15
TaÅŁ
0.15
builtin
0.15
ADDE
0.14
ña
0.14
ï¸
0.14
нок
0.14
alth
0.14
Activations Density 0.073%