INDEX
Explanations
code snippets enclosed in backticks
New Auto-Interp
Negative Logits
•••
-0.85
••••
-0.81
neb
-0.77
Paredes
-0.77
avajillas
-0.74
er
-0.73
Penh
-0.72
Norwood
-0.71
oub
-0.71
—"
-0.71
POSITIVE LOGITS
`
1.87
.`
1.83
=`
1.83
:`
1.76
{`1.68
(`
1.65
>`
1.63
)`
1.62
(`
1.60
`<
1.55
Activations Density 0.079%