INDEX
Explanations
closing parenthesis followed by punctuation
New Auto-Interp
Negative Logits
و
0.79
u
0.77
ش
0.73
이
0.69
ي
0.66
기
0.64
ين
0.61
на
0.60
り
0.59
ন
0.58
POSITIVE LOGITS
।
0.65
-
0.55
(
0.52
。
0.51
I
0.47
(
0.47
cence
0.46
cido
0.44
entino
0.43
punkt
0.43
Activations Density 0.099%