INDEX
Explanations
ideas and beliefs that are debunked or challenged, particularly surrounding societal norms and stereotypes
New Auto-Interp
Negative Logits
emouth
-0.68
ktop
-0.66
Interstitial
-0.65
ients
-0.62
orneys
-0.62
artney
-0.62
appointments
-0.61
POLIT
-0.61
×ķ
-0.60
foreseen
-0.60
POSITIVE LOGITS
icist
1.07
busters
0.96
arily
0.94
ril
0.94
olog
0.90
telling
0.88
mong
0.88
Myth
0.88
ologies
0.88
lore
0.84
Activations Density 0.111%