INDEX
Explanations
references to political structures and organizations
New Auto-Interp
Negative Logits
ÑĦÑĥнда
-0.17
krev
-0.16
amburger
-0.15
GenerationStrategy
-0.14
opp
-0.14
ФедеÑĢалÑĮ
-0.14
Russell
-0.14
õi
-0.14
ITHER
-0.14
moderator
-0.13
POSITIVE LOGITS
Party
0.27
Lenin
0.27
party
0.23
Stalin
0.23
PARTY
0.23
Party
0.21
Workers
0.20
Worker
0.20
GPU
0.20
NK
0.19
Activations Density 0.047%