INDEX
Explanations
American or English followed by nouns
New Auto-Interp
Negative Logits
ه
1.60
tiden
1.53
ن
1.51
ം
1.50
ের
1.48
пример
1.48
ات
1.37
ானா
1.37
ként
1.34
لية
1.34
POSITIVE LOGITS
ic
1.74
ası
1.64
ll
1.59
th
1.50
ln
1.50
row
1.45
ning
1.45
an
1.40
lar
1.40
nt
1.36
Activations Density 0.251%