INDEX
Explanations
phrases related to misinformation, propaganda, fake news, and lies
references to fake news and misinformation
New Auto-Interp
Negative Logits
foreseen
-0.76
onen
-0.71
cit
-0.70
rontal
-0.69
ktop
-0.67
externalActionCode
-0.66
amiya
-0.65
iosyn
-0.64
airo
-0.64
emale
-0.64
POSITIVE LOGITS
perpetrated
1.15
debunked
1.15
propag
1.13
pedd
1.12
spew
1.10
concoct
1.06
mong
1.05
debunk
1.05
disinformation
1.05
misinformation
1.04
Activations Density 0.212%