INDEX
Explanations
avoids unnecessary duplication
New Auto-Interp
Negative Logits
...”
1.01
…</
0.97
0.88
0.86
…’
0.83
0.81
’)
0.80
’),
0.79
Germania
0.79
,’
0.79
POSITIVE LOGITS
avoids
1.14
using
1.06
Also
1.05
Also
1.05
also
1.04
0.99
0.93
avoiding
0.93
using
0.92
very
0.89
Activations Density 0.397%