INDEX
Explanations
stereotypes and common perceptions
New Auto-Interp
Negative Logits
computational
0.51
computation
0.50
Computational
0.48
computations
0.47
constrained
0.45
Computational
0.45
Suitable
0.44
computationally
0.43
programmable
0.42
Computation
0.42
POSITIVE LOGITS
stereotype
0.98
stereotypes
0.89
anecdotal
0.84
sadly
0.79
estere
0.77
stereotyp
0.75
stereotypical
0.72
anonymity
0.72
Stere
0.71
misog
0.71
Activations Density 0.012%