INDEX
Explanations
code structures and syntactical elements
New Auto-Interp
Negative Logits
purpoſe
-1.04
pleaſure
-0.99
Theſe
-0.97
myſelf
-0.96
uſe
-0.94
houſe
-0.93
iſt
-0.93
ſtate
-0.92
leaſt
-0.92
ſeveral
-0.92
POSITIVE LOGITS
and
0.99
,
0.86
or
0.77
in
0.75
which
0.75
as
0.69
at
0.66
for
0.65
also
0.63
on
0.62
Activations Density 0.725%