INDEX
Explanations
names of individuals or characters
New Auto-Interp
Negative Logits
ÑģÑĮ
-0.20
jac
-0.19
jt
-0.18
otherwise
-0.17
ke
-0.17
verage
-0.17
jang
-0.17
jes
-0.17
jad
-0.17
jk
-0.17
POSITIVE LOGITS
er
0.20
ing
0.18
ume
0.17
ding
0.17
ledge
0.17
oci
0.16
EntryPoint
0.15
ness
0.15
erb
0.15
ful
0.15
Activations Density 0.097%