INDEX
Explanations
proper nouns, particularly names of individuals and their associated details
New Auto-Interp
Negative Logits
repl
-0.15
ubiquitous
-0.14
multif
-0.14
adaki
-0.13
aded
-0.13
.vec
-0.13
uzey
-0.13
agan
-0.13
716
-0.13
.rev
-0.13
POSITIVE LOGITS
pty
0.17
verter
0.15
appro
0.14
ynet
0.14
phins
0.14
setattr
0.14
ÑĤин
0.13
çuk
0.13
heavy
0.13
assim
0.13
Activations Density 0.056%