INDEX
Explanations
careful plus an action or object
New Auto-Interp
Negative Logits
?
1.17
)
1.09
\
1.06
),
1.05
ذریع
1.05
)،
1.01
</i>
0.98
!
0.96
)’
0.95
dır
0.94
POSITIVE LOGITS
n
1.76
p
1.20
ag
1.16
id
1.14
a
1.13
ن
1.12
el
1.11
at
1.09
ol
1.07
ad
1.06
Activations Density 0.015%