INDEX
Explanations
instances of specific coded prefixes or building blocks in words
New Auto-Interp
Negative Logits
els
-0.20
of
-0.20
ow
-0.19
ens
-0.19
elen
-0.19
ent
-0.19
ogs
-0.19
ene
-0.18
en
-0.18
ะ
-0.18
POSITIVE LOGITS
er
0.24
eri
0.21
erer
0.21
hyth
0.20
hythm
0.20
iginal
0.19
uncated
0.19
ighth
0.18
erin
0.18
æľµ
0.18
Activations Density 0.139%