INDEX
Explanations
punctuation and structural elements in code
New Auto-Interp
Negative Logits
MIS
-0.17
ilde
-0.16
.wp
-0.15
Rebellion
-0.15
ancies
-0.14
ØŃاÙ쨏
-0.13
ÑĢап
-0.13
zych
-0.13
oland
-0.13
icken
-0.13
POSITIVE LOGITS
unless
0.26
unless
0.25
my
0.22
confess
0.22
die
0.22
my
0.21
cro
0.20
(my
0.20
scalar
0.20
0.20
Activations Density 0.002%