INDEX
Explanations
mentions of social media
mentions of social media
New Auto-Interp
Negative Logits
nces
-0.98
ilts
-0.74
urat
-0.73
Cursed
-0.72
shall
-0.72
xual
-0.71
atche
-0.70
ARDS
-0.67
Flore
-0.64
Ridge
-0.64
POSITIVE LOGITS
networks
1.01
networking
0.98
izing
0.90
gatherings
0.83
media
0.83
istic
0.81
ized
0.80
isms
0.79
cues
0.78
platforms
0.78
Activations Density 0.027%