INDEX
Explanations
social media platform names
mentions of social media platforms
New Auto-Interp
Negative Logits
experien
-0.72
chwitz
-0.68
footing
-0.67
quartered
-0.65
undercover
-0.65
stopp
-0.64
ynthesis
-0.64
aterasu
-0.63
sole
-0.63
conflicted
-0.63
POSITIVE LOGITS
Tumblr
1.39
1.27
1.25
1.21
1.20
1.12
1.09
Tumblr
1.07
0.94
0.93
Activations Density 0.046%