INDEX
Explanations
yang followed by adjectives
New Auto-Interp
Negative Logits
ب
1.30
of
1.26
ли
1.23
ar
1.20
است
1.20
ра
1.13
↵↵
1.11
er
1.11
ма
1.10
ες
1.10
POSITIVE LOGITS
yang
1.38
ที่
1.20
molto
1.16
yanı
1.15
が
1.13
jeste
1.13
y
1.12
frumo
1.10
salari
1.09
been
1.09
Activations Density 0.000%