INDEX
Explanations
punctuation and list structure
New Auto-Interp
Negative Logits
it
0.61
It
0.55
It
0.54
<unused335>
0.49
They
0.48
simply
0.47
s
0.45
ों
0.44
<unused284>
0.44
который
0.44
POSITIVE LOGITS
,
0.70
،
0.69
(),
0.56
。
0.50
(
0.49
$,
0.49
0.48
》,
0.46
))
0.46
,
0.46
Activations Density 0.075%