INDEX
Explanations
words related to misinformation, inaccuracies, and misconceptions
terms related to inaccuracies and misinformation
New Auto-Interp
Negative Logits
amen
-0.77
agra
-0.75
mun
-0.75
gans
-0.73
imen
-0.72
atom
-0.71
shall
-0.69
spring
-0.69
lining
-0.68
alm
-0.68
POSITIVE LOGITS
misconceptions
1.08
misinformation
1.07
inaccur
1.06
misconception
1.01
inaccurate
1.00
misunderstanding
0.96
incorrectly
0.95
misinterpret
0.92
mistaken
0.92
misunderstand
0.90
Activations Density 0.022%