INDEX
Explanations
words related to philosophical concepts and qualities, as well as characteristics associated with gender stereotypes
New Auto-Interp
Negative Logits
Canaver
-0.76
etsy
-0.71
QUIRE
-0.71
ãģ®éŃĶ
-0.68
ISTORY
-0.65
CLAIM
-0.64
RIPT
-0.64
acas
-0.64
Sparks
-0.62
ä½ľ
-0.60
POSITIVE LOGITS
ones
0.89
versa
0.86
etc
0.80
-)
0.79
+.
0.76
-.
0.74
*.
0.73
respectively
0.72
entric
0.71
ecided
0.70
Activations Density 0.406%