INDEX
Explanations
pronouns and possessive adjectives indicating relationships between people or entities
pronouns and possessives
New Auto-Interp
Negative Logits
exp
-0.35
เช
-0.34
de
-0.34
cái
-0.34
ูล
-0.33
bu
-0.32
of
-0.31
שב
-0.31
từ
-0.31
sounding
-0.30
POSITIVE LOGITS
Him
0.83
meille
0.81
Them
0.76
Them
0.75
them
0.72
Him
0.71
Himself
0.71
conmigo
0.71
THEM
0.71
them
0.71
Activations Density 0.006%