INDEX
Explanations
letters, numbers, and foreign words
New Auto-Interp
Negative Logits
t
0.47
م
0.47
ா
0.45
ا
0.45
ం
0.43
ק
0.41
क
0.40
信息
0.39
त
0.39
a
0.39
POSITIVE LOGITS
F
0.45
for
0.42
of
0.40
8
0.39
पाँच
0.38
ไม่
0.38
जाँच
0.38
E
0.38
Y
0.38
deux
0.37
Activations Density 1.207%