INDEX
Explanations
proper names and specific references related to individuals
New Auto-Interp
Negative Logits
ehir
-0.17
innamon
-0.17
ossible
-0.14
_exceptions
-0.14
rette
-0.14
hei
-0.13
ksam
-0.13
asu
-0.13
acter
-0.13
eah
-0.13
POSITIVE LOGITS
son
0.20
sson
0.19
oldt
0.15
desc
0.14
spath
0.14
pher
0.14
utow
0.14
son
0.14
ellow
0.14
IEWS
0.14
Activations Density 0.244%