INDEX
Explanations
foreign negative/question particles
New Auto-Interp
Negative Logits
p
1.20
i
1.09
re
1.08
m
1.08
ro
1.07
ى
1.02
a
0.99
n
0.98
یل
0.97
"
0.97
POSITIVE LOGITS
𝟎
1.13
۰
1.05
কে
1.04
on
1.01
0
0.99
_{0.95
০
0.93
an
0.93
িত
0.91
០
0.91
Activations Density 0.007%