INDEX
Explanations
words related to distinguishing reality from misinformation
New Auto-Interp
Negative Logits
Fishing
-0.15
olist
-0.15
oli
-0.14
riv
-0.14
earth
-0.14
ator
-0.14
coder
-0.14
ventus
-0.13
Retro
-0.13
enz
-0.13
POSITIVE LOGITS
factual
0.19
<quote
0.17
unin
0.17
facts
0.16
actual
0.15
chants
0.15
facts
0.15
opak
0.14
falsehood
0.14
distortion
0.14
Activations Density 0.360%