INDEX
Explanations
proper names and titles, particularly those of people and literary works
New Auto-Interp
Negative Logits
nt
-0.23
ma
-0.22
ning
-0.20
ro
-0.20
ries
-0.20
nya
-0.20
ness
-0.20
mon
-0.19
me
-0.18
soever
-0.18
POSITIVE LOGITS
'nun
0.21
ffset
0.21
’nun
0.20
gether
0.20
alesce
0.19
ject
0.17
hiba
0.17
ceph
0.17
ymous
0.17
elho
0.16
Activations Density 0.512%