INDEX
Explanations
various forms of stereotypes
references to stereotypes and myths
New Auto-Interp
Negative Logits
inth
-0.80
ateur
-0.77
ayan
-0.75
sterdam
-0.74
rique
-0.72
Sync
-0.68
undred
-0.66
ighth
-0.66
hner
-0.66
ucha
-0.65
POSITIVE LOGITS
stereotyp
0.98
stereotype
0.89
stereotypes
0.89
notions
0.81
è¦ļéĨĴ
0.81
clich
0.78
msec
0.73
depictions
0.73
portray
0.73
Monstrous
0.71
Activations Density 0.047%