INDEX
Explanations
phrases related to social media
references to social media
New Auto-Interp
Negative Logits
nces
-0.98
xual
-0.83
ilts
-0.77
Cursed
-0.74
shall
-0.71
atche
-0.70
Centauri
-0.69
ered
-0.68
ARDS
-0.66
ras
-0.66
POSITIVE LOGITS
networking
0.96
networks
0.94
izing
0.90
ized
0.83
gatherings
0.82
istic
0.80
norms
0.78
izers
0.78
ization
0.78
cues
0.76
Activations Density 0.026%