INDEX
Explanations
terms related to incorrect beliefs or understanding
terms related to misconceptions and misinformation
New Auto-Interp
Negative Logits
igree
-0.74
atom
-0.72
negie
-0.71
ramid
-0.69
edom
-0.67
imen
-0.66
itar
-0.66
incinn
-0.66
ropri
-0.65
gans
-0.65
POSITIVE LOGITS
misunderstanding
1.01
misconceptions
1.00
misconception
0.94
misunderstand
0.87
inaccur
0.85
misinformation
0.84
misinterpret
0.84
incorrectly
0.81
misunderstood
0.77
inaccurate
0.76
Activations Density 0.044%