INDEX
Explanations
numbered list item followed by punctuation
New Auto-Interp
Negative Logits
Of
0.79
Dedicated
0.71
Absolutely
0.71
Independent
0.71
OF
0.70
dotyczące
0.69
Of
0.69
About
0.69
Improving
0.69
Making
0.68
POSITIVE LOGITS
.")
0.86
."),
0.81
.").
0.80
\}.
0.79
isone
0.77
<unused1847>
0.76
<unused1839>
0.75
<unused1855>
0.75
^^
0.74
irrit
0.74
Activations Density 0.058%