INDEX
Explanations
titles and references to well-known dystopian literature
New Auto-Interp
Negative Logits
ì´Į
-0.16
_PATCH
-0.15
unan
-0.15
362
-0.15
501
-0.15
482
-0.14
arith
-0.14
廳
-0.14
602
-0.14
trench
-0.13
POSITIVE LOGITS
dyst
0.23
Hand
0.23
/command
0.18
Hand
0.18
Hulu
0.18
Commander
0.18
ut
0.17
sponsor
0.17
reproductive
0.17
wives
0.16
Activations Density 0.012%