INDEX
Explanations
references to collective identities or groups within a social context
New Auto-Interp
Negative Logits
Bauer
-0.64
Tale
-0.63
참고
-0.63
zt
-0.61
iop
-0.57
ar
-0.56
ваз
-0.56
dichos
-0.56
Vaz
-0.56
POS
-0.55
POSITIVE LOGITS
everyone
3.21
everyone
3.09
Everyone
2.96
everybody
2.95
Everyone
2.93
everybody
2.87
Everybody
2.87
Everybody
2.76
EVERYONE
2.68
EVERY
1.98
Activations Density 0.057%