INDEX
Explanations
elements related to societal control and dystopian themes
New Auto-Interp
Negative Logits
Hayes
-0.19
atron
-0.18
831
-0.17
311
-0.16
cation
-0.15
Bans
-0.15
811
-0.15
uvre
-0.14
591
-0.14
931
-0.14
POSITIVE LOGITS
ilater
0.19
saf
0.18
Fet
0.17
alph
0.17
iences
0.17
иÑĤелÑĮÑģÑĤва
0.16
ordin
0.16
imb
0.16
rights
0.16
pees
0.16
Activations Density 0.536%