INDEX
Explanations
Russian, Spanish, or Bengali single letters
New Auto-Interp
Negative Logits
न
3.24
ل
2.53
ন
2.47
l
2.47
s
2.13
دخل
2.13
ン
2.12
ాన్ని
2.09
н
2.02
lardan
2.00
POSITIVE LOGITS
是
1.74
ý
1.73
𝘨
1.72
𝘬
1.71
𝘦
1.67
ными
1.64
ly
1.63
ছিলেন
1.62
로
1.62
𝘰
1.61
Activations Density 0.021%