INDEX
Explanations
possessive pronouns and words related to personal relationships and connections
New Auto-Interp
Negative Logits
dma
-0.16
mun
-0.14
æ´¥
-0.14
being
-0.14
larg
-0.13
418
-0.13
hm
-0.13
SEM
-0.13
imens
-0.13
irk
-0.13
POSITIVE LOGITS
alive
0.17
ç´Ģ
0.17
iola
0.16
tabs
0.15
hold
0.15
_tokenize
0.15
enant
0.15
èĦ
0.15
alive
0.14
akes
0.14
Activations Density 0.032%