INDEX
Explanations
references to political control and totalitarianism
New Auto-Interp
Negative Logits
illez
-0.16
gage
-0.15
ellungen
-0.15
oggles
-0.14
ÑģиÑĤ
-0.14
گذ
-0.13
ijo
-0.13
.ba
-0.13
ampie
-0.13
ùi
-0.13
POSITIVE LOGITS
Winston
0.30
Orwell
0.28
inston
0.28
Emmanuel
0.22
dyst
0.21
Party
0.21
surveillance
0.21
Euras
0.20
Brave
0.20
Animal
0.20
Activations Density 0.088%