INDEX
Explanations
topics related to stereotypes and generalizations, particularly about gender and race
gender roles and norms
New Auto-Interp
Negative Logits
BorderSide
-0.35
edile
-0.35
ably
-0.34
lotl
-0.34
SaveChangesAsync
-0.34
ModelSerializer
-0.34
animous
-0.33
blotting
-0.33
wiście
-0.33
amicable
-0.33
POSITIVE LOGITS
stereotypes
0.71
stereotype
0.68
myſelf
0.65
Monfieur
0.58
pigeon
0.58
stereotyp
0.56
Anſ
0.56
prejudices
0.56
ſelves
0.52
pigeon
0.52
Activations Density 0.086%