INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
𝘁
2.77
𝘰
2.59
burse
2.57
𝚝
2.55
𝘦
2.51
𝘳
2.51
𝘵
2.50
َّ
2.48
儡
2.45
`--
2.45
POSITIVE LOGITS
м
4.81
ه
4.81
a
4.61
ו
4.52
i
4.07
ي
3.97
ে
3.94
۰
3.79
ン
3.75
ی
3.69
Activations Density 0.157%