INDEX
Explanations
proper nouns, especially names of authors and characters in literature
New Auto-Interp
Negative Logits
.fast
-0.14
.vec
-0.14
>ID
-0.14
adaki
-0.14
گاÙĨ
-0.14
ubiquitous
-0.13
%X
-0.13
otre
-0.13
akan
-0.13
.failure
-0.13
POSITIVE LOGITS
pty
0.15
verter
0.15
Commod
0.15
assim
0.15
setattr
0.14
bast
0.14
phins
0.14
imp
0.14
ño
0.13
heavy
0.13
Activations Density 0.050%