INDEX
Explanations
instances where someone is being criticized for making a mistake or an error in judgment
common themes of mistakes or misconceptions in understanding or approaches
New Auto-Interp
Negative Logits
idable
-0.72
orem
-0.71
ivably
-0.70
ãĥ´ãĤ¡
-0.68
Pry
-0.67
venge
-0.66
Sever
-0.64
Heights
-0.64
bolt
-0.64
shall
-0.63
POSITIVE LOGITS
simplistic
1.28
misconceptions
1.17
misunderstanding
1.11
precon
1.08
privile
1.08
stereotypes
1.08
reliance
1.06
stereotyp
1.05
assumptions
1.04
misinformation
1.04
Activations Density 0.705%