INDEX
Explanations
nouns indicating roles, professions, or identities of individuals
New Auto-Interp
Negative Logits
aminer
-0.17
cient
-0.16
spou
-0.16
ctal
-0.16
andal
-0.16
rud
-0.15
SO
-0.15
thương
-0.14
byn
-0.14
ervo
-0.14
POSITIVE LOGITS
former
0.24
.k
0.23
member
0.20
frequent
0.19
native
0.16
frequ
0.16
woke
0.15
fixture
0.15
Former
0.15
long
0.15
Activations Density 0.129%