INDEX
Negative Logits
'
1.35
}$
1.00
一
0.97
‘
0.94
↵
0.89
0
0.82
</td>
0.81
어가
0.80
}
0.79
המ
0.79
POSITIVE LOGITS
ing
1.30
o
1.28
al
1.20
can
1.03
ia
1.02
as
1.02
in
1.02
aj
1.02
ud
1.00
can
1.00
Activations Density 0.005%
'
}$
一
‘
↵
0
</td>
어가
}
המ
ing
o
al
can
ia
as
in
aj
ud
can