INDEX
Negative Logits
a
1.00
this
0.94
this
0.88
any
0.88
nt
0.82
ating
0.80
an
0.78
man
0.78
the
0.77
so
0.77
POSITIVE LOGITS
`)
0.74
厈
0.70
0.69
,{\0.68
tır
0.67
0.64
(["
0.63
;'>
0.62
xcsche
0.61
'[
0.61
Activations Density 0.001%
a
this
this
any
nt
ating
an
man
the
so
`)
厈
,{\tır
(["
;'>
xcsche
'[