INDEX
Explanations
tables summarizing differences
New Auto-Interp
Negative Logits
1.40
1.35
1.34
1.30
1.24
1.18
1.15
1.12
1.10
1.07
POSITIVE LOGITS
assignment
0.77
--");
0.76
--',
0.73
playthrough
0.73
)}$,
0.72
)..
0.71
assignments
0.71
..)
0.71
streamline
0.69
}){0.67
Activations Density 0.017%