INDEX
Explanations
option numbering or headings
New Auto-Interp
Negative Logits
off
0.72
net
0.70
not
0.64
them
0.64
mat
0.63
no
0.63
to
0.62
nya
0.62
talk
0.61
time
0.61
POSITIVE LOGITS
<unused2162>
0.66
<unused1753>
0.64
Méd
0.62
testAvg
0.61
Мар
0.61
Prensa
0.61
<unused2143>
0.59
<unused986>
0.59
<unused989>
0.58
㛲
0.58
Activations Density 0.065%