INDEX
Explanations
references to names or nominative terms
New Auto-Interp
Negative Logits
BuilderInterface
-0.18
fü
-0.16
inha
-0.16
anie
-0.15
anja
-0.15
.eng
-0.15
ENCIES
-0.14
Morton
-0.14
iences
-0.14
Feng
-0.14
POSITIVE LOGITS
nom
0.30
nom
0.28
Nom
0.27
Nom
0.24
inal
0.22
adic
0.21
ination
0.21
inate
0.20
ascus
0.20
NOM
0.20
Activations Density 0.007%