INDEX
Explanations
words related to misinformation and deception
phrases related to falsehoods or misinformation
New Auto-Interp
Negative Logits
hens
-1.03
guiActiveUnfocused
-0.92
hetti
-0.88
ajo
-0.79
mun
-0.77
aldo
-0.76
ODY
-0.75
xual
-0.75
APTER
-0.73
rador
-0.72
POSITIVE LOGITS
positives
1.02
accuser
0.85
guiActiveUn
0.84
false
0.82
dich
0.81
falsely
0.76
unfocusedRange
0.75
guiIcon
0.73
negatives
0.72
assumptions
0.72
Activations Density 0.019%