INDEX
Explanations
used after agreement or confirmation
New Auto-Interp
Negative Logits
า
2.02
ب
1.30
ס
1.23
ia
1.14
ك
1.14
ם
1.14
ך
1.12
,
1.11
a
1.05
that
1.05
POSITIVE LOGITS
I
1.32
for
1.31
ind
1.25
ib
1.06
ot
1.05
W
1.03
Τ
1.03
they
1.02
F
1.02
X
1.02
Activations Density 0.000%