INDEX
Explanations
proper nouns, particularly names of individuals and organizations
New Auto-Interp
Negative Logits
bsite
-0.16
ÄĻ
-0.15
ipur
-0.15
phinx
-0.15
ustum
-0.14
érc
-0.14
ixin
-0.14
ffen
-0.14
prompt
-0.14
simply
-0.14
POSITIVE LOGITS
son
0.14
éĽĦ
0.14
erts
0.13
ãĥ³ãĥķ
0.13
bie
0.13
ubbles
0.13
Herbert
0.13
dÄĽl
0.13
arity
0.13
scar
0.13
Activations Density 0.205%