INDEX
Explanations
auxiliary verb constructions
New Auto-Interp
Negative Logits
ка
1.27
at
1.24
ل
1.21
ל
1.18
it
1.10
ת
1.09
'
1.05
ه
1.05
if
0.99
’
0.99
POSITIVE LOGITS
1.52
be
1.02
of
0.99
is
0.98
dır
0.90
OF
0.83
的声音
0.83
های
0.80
of
0.78
৬
0.77
Activations Density 0.334%