INDEX
Explanations
references to propaganda and its impact on public perception
New Auto-Interp
Negative Logits
ÑģÑĸм
-0.14
business
-0.14
trans
-0.13
impro
-0.13
ziej
-0.13
imit
-0.13
pert
-0.13
638
-0.13
math
-0.13
trans
-0.13
POSITIVE LOGITS
propaganda
0.24
spin
0.22
Spin
0.22
spin
0.21
Spin
0.21
-spin
0.20
spins
0.19
istrovstvÃŃ
0.18
filtro
0.18
propag
0.18
Activations Density 0.262%