INDEX
Explanations
references to George Orwell and his works
New Auto-Interp
Negative Logits
chin
-0.15
ικα
-0.15
奴
-0.15
oid
-0.14
747
-0.14
REP
-0.13
bfd
-0.13
ноз
-0.13
yn
-0.13
burg
-0.13
POSITIVE LOGITS
oldem
0.17
elage
0.15
.reporting
0.14
comp
0.14
ünd
0.14
Tou
0.14
maj
0.14
amilia
0.14
è§
0.14
benh
0.13
Activations Density 0.036%