INDEX
Explanations
articles and specific nouns referring to relationships and familial connections
New Auto-Interp
Negative Logits
жив
-0.18
rende
-0.15
osa
-0.15
ãĥĥãĥĹ
-0.14
oss
-0.14
567
-0.14
اش
-0.14
rz
-0.14
nda
-0.13
raf
-0.13
POSITIVE LOGITS
apesh
0.18
arella
0.16
edii
0.15
Guy
0.15
_nl
0.15
velt
0.15
apult
0.15
odian
0.15
_UNICODE
0.15
aginator
0.14
Activations Density 0.004%