INDEX
Explanations
beginnings of compound words
New Auto-Interp
Negative Logits
ad
1.28
an
1.14
ir
1.13
та
1.09
z
1.01
the
0.98
w
0.98
u
0.95
is
0.89
us
0.89
POSITIVE LOGITS
8
0.68
bör
0.63
৮
0.63
টি
0.61
↵↵
0.61
オ
0.60
Ι
0.58
க்
0.57
lüğ
0.54
烁
0.54
Activations Density 4.074%