INDEX
Explanations
words related to challenging or being challenged
concepts and actions related to challenging established ideas or societal norms
New Auto-Interp
Negative Logits
gp
-0.73
storage
-0.68
··
-0.67
ng
-0.67
pool
-0.64
gas
-0.64
]}
-0.63
bath
-0.63
abouts
-0.62
anuts
-0.61
POSITIVE LOGITS
precon
1.36
stereotypes
1.31
assumptions
1.30
orthodoxy
1.14
misconceptions
1.13
myths
1.09
conventional
1.07
prevailing
1.04
notions
1.02
beliefs
1.01
Activations Density 0.249%