INDEX
Explanations
year, number, page, edition
New Auto-Interp
Negative Logits
l
0.70
><
0.70
returning
0.69
ac
0.67
m
0.67
return
0.67
dan
0.64
using
0.64
is
0.63
are
0.63
POSITIVE LOGITS
<unused394>
1.01
Reprint
0.93
<unused412>
0.92
<unused480>
0.92
<unused1930>
0.89
<unused410>
0.87
<unused2094>
0.86
<unused938>
0.86
<unused2078>
0.85
<unused1679>
0.84
Activations Density 0.001%