INDEX
Explanations
references to social media
mentions of social media
New Auto-Interp
Negative Logits
nces
-0.90
Cursed
-0.83
ñ
-0.78
Lank
-0.75
gger
-0.71
xual
-0.71
nce
-0.70
wered
-0.70
eters
-0.69
iasis
-0.69
POSITIVE LOGITS
relations
0.86
cohesion
0.84
psychologists
0.81
psychologist
0.80
networking
0.79
norms
0.79
behavi
0.79
psychology
0.76
cues
0.74
welf
0.74
Activations Density 0.025%