INDEX
Explanations
social media platform names
mentions of social media platforms, especially Twitter
New Auto-Interp
Negative Logits
sth
-0.93
sole
-0.75
ĪĴ
-0.75
onics
-0.70
mble
-0.65
rien
-0.64
clud
-0.64
kay
-0.64
-,
-0.63
rapp
-0.62
POSITIVE LOGITS
1.05
Whats
0.89
0.77
0.71
Patreon
0.69
Tumblr
0.68
ESA
0.67
0.67
Tumblr
0.66
0.65
Activations Density 0.024%