INDEX
Explanations
phrases related to propaganda
mentions of propaganda and related concepts
New Auto-Interp
Negative Logits
kens
-0.79
bang
-0.70
heed
-0.69
thur
-0.68
ESV
-0.67
IFE
-0.67
aird
-0.66
clud
-0.66
ighth
-0.66
aldo
-0.66
POSITIVE LOGITS
campaigns
1.14
propaganda
1.01
leaflets
0.98
disinformation
0.97
aganda
0.96
dissemin
0.96
posters
0.94
eering
0.93
tactics
0.93
ploy
0.92
Activations Density 0.068%