INDEX
Explanations
social media platforms
mentions of the social media platform Twitter
New Auto-Interp
Negative Logits
sth
-0.64
onics
-0.63
tranqu
-0.62
clud
-0.62
felon
-0.62
Reloaded
-0.60
ordinances
-0.60
cision
-0.58
itching
-0.58
unci
-0.57
POSITIVE LOGITS
0.85
Whats
0.75
edin
0.74
Attribution
0.73
Tumblr
0.71
Mehran
0.69
sylv
0.69
="#
0.69
0.68
achine
0.68
Activations Density 0.015%