INDEX
Explanations
references to propaganda efforts and media manipulation
New Auto-Interp
Negative Logits
impro
-0.15
esture
-0.15
Compression
-0.14
ocities
-0.14
opard
-0.14
981
-0.13
peripheral
-0.13
Trades
-0.13
еÑĢÑĮ
-0.13
chwitz
-0.12
POSITIVE LOGITS
propaganda
0.38
Prop
0.38
prop
0.37
-prop
0.36
propag
0.35
пÑĢоп
0.34
PROP
0.34
Prop
0.28
propagation
0.28
PROP
0.27
Activations Density 0.116%