INDEX
Explanations
Point, points, and what follows
New Auto-Interp
Negative Logits
ри
1.74
ні
1.48
ру
1.25
ных
1.18
ري
1.18
ς
1.16
v
1.13
ك
1.13
ро
1.11
то
1.11
POSITIVE LOGITS
n
1.38
⦖
1.09
ak
1.06
>
1.05
ন
1.05
밂
1.05
न
1.03
nä
0.99
ique
0.98
น
0.94
Activations Density 0.067%