INDEX
Explanations
stereotypes and common perceptions
New Auto-Interp
Negative Logits
NMR
0.43
ಉತ್ತಮ
0.39
liquer
0.37
يكم
0.37
NMR
0.36
ентите
0.36
沚
0.36
مرا
0.36
MRT
0.35
IO
0.35
POSITIVE LOGITS
stereotype
1.68
stereotypes
1.59
stereotyp
1.55
stereotypical
1.55
стере
1.23
popularly
1.16
Stere
1.15
stere
1.13
perceptions
1.13
estere
1.13
Activations Density 0.068%