INDEX
Explanations
references to social comparisons and competition among individuals
New Auto-Interp
Negative Logits
ereotype
-0.15
iland
-0.14
Ion
-0.14
.rs
-0.14
Ãĸn
-0.14
ÑĦоÑĢми
-0.13
ekler
-0.13
riot
-0.13
stere
-0.13
power
-0.13
POSITIVE LOGITS
fucks
0.22
fucked
0.22
fuck
0.20
fuck
0.20
Fuck
0.19
fucking
0.18
kee
0.17
cunt
0.17
Fucking
0.17
FUCK
0.17
Activations Density 0.866%