INDEX
Explanations
words related to challenging or dispelling stereotypes and myths
references to stereotypes and discussions about their implications
New Auto-Interp
Negative Logits
ea
-0.79
avail
-0.74
Liquid
-0.71
liquid
-0.68
ener
-0.66
Aur
-0.65
pending
-0.65
live
-0.64
pload
-0.64
authorized
-0.62
POSITIVE LOGITS
stereotypes
3.57
stereotype
3.47
stereotyp
2.67
stereotypical
2.66
clich
2.09
caricature
1.94
tropes
1.93
misconceptions
1.78
prejudices
1.71
cliché
1.71
Activations Density 0.026%