INDEX
Explanations
terms related to misinformation and propaganda
terms related to misinformation and propaganda
New Auto-Interp
Negative Logits
imen
-0.75
rared
-0.69
airo
-0.67
urat
-0.67
onga
-0.67
atta
-0.66
cise
-0.65
inence
-0.65
atal
-0.65
ropri
-0.64
POSITIVE LOGITS
perpetrated
0.94
misinformation
0.83
disinformation
0.82
spreads
0.82
mong
0.78
propag
0.77
hoax
0.77
tactics
0.71
ument
0.71
detector
0.71
Activations Density 0.076%