INDEX
Explanations
terms related to propaganda
references to propaganda and its various forms and implications
New Auto-Interp
Negative Logits
ergy
-0.90
nings
-0.72
earth
-0.69
clud
-0.68
cker
-0.67
20439
-0.65
Merit
-0.64
thur
-0.64
wives
-0.64
Eucl
-0.64
POSITIVE LOGITS
aganda
1.14
propaganda
0.90
posters
0.88
leaflets
0.87
suppression
0.85
disinformation
0.81
dissemin
0.80
blitz
0.77
poster
0.76
pamphlet
0.76
Activations Density 0.022%