INDEX
Negative Logits
2
1.26
\*
0.94
\*
0.90
2
0.88
*
0.88
*
0.88
twenty
0.84
२
0.83
*,
0.82
↵↵
0.79
POSITIVE LOGITS
sType
0.92
take
0.88
romatic
0.88
rest
0.84
shed
0.84
sé
0.83
erci
0.82
Sé
0.82
둴
0.82
sick
0.82
Activations Density 0.020%
2
\*
\*
2
*
*
twenty
२
*,
↵↵
sType
take
romatic
rest
shed
sé
erci
Sé
둴
sick