INDEX
Explanations
numbers following certain punctuation
New Auto-Interp
Negative Logits
(
1.03
be
0.88
in
0.75
'
0.75
I
0.73
that
0.66
an
0.63
le
0.62
H
0.59
도
0.58
POSITIVE LOGITS
на
0.96
ون
0.83
า
0.78
ке
0.75
ان
0.73
માં
0.71
тэй
0.71
да
0.70
良い
0.69
い
0.68
Activations Density 1.313%