INDEX
Explanations
comparison table or structured lists
New Auto-Interp
Negative Logits
)}$
1.12
}}$
1.10
}$
1.08
}}$
1.03
}$.
1.02
)}$.
1.01
}$,
0.98
)}$,
0.97
)$:
0.95
\}$.
0.94
POSITIVE LOGITS
1.80
1.79
1.74
1.72
1.72
1.53
1.52
………………………………..
1.49
1.47
1.46
Activations Density 0.084%