INDEX
Explanations
until followed by a noun or pronoun
New Auto-Interp
Negative Logits
ı
1.25
ená
1.23
ை
1.23
dır
1.20
וא
1.17
larında
1.14
ın
1.13
kaan
1.13
ıyla
1.13
اوقات
1.13
POSITIVE LOGITS
ছেন
1.19
서
1.19
و
1.16
ון
0.98
),
0.94
ح
0.91
ס
0.91
с
0.90
Основ
0.89
ির
0.87
Activations Density 0.179%