INDEX
Explanations
Russian, Korean, and other non-English words
New Auto-Interp
Negative Logits
as
1.20
to
1.18
ра
1.17
йс
1.02
рс
1.00
are
0.98
той
0.98
見
0.98
то
0.96
軞
0.95
POSITIVE LOGITS
א
1.88
ת
1.74
EN
1.66
О
1.63
لی
1.55
Z
1.48
ב
1.43
は
1.42
ال
1.40
IC
1.39
Activations Density 0.000%