INDEX
Explanations
various language word endings
New Auto-Interp
Negative Logits
ğini
2.08
ん
1.84
ing
1.80
त
1.72
л
1.71
तून
1.66
س
1.64
ff
1.58
で
1.58
an
1.57
POSITIVE LOGITS
ных
2.13
THING
1.93
ları
1.91
ların
1.87
ные
1.83
ين
1.75
ный
1.70
lık
1.70
िकी
1.66
dS
1.63
Activations Density 0.132%