INDEX
Explanations
female pronouns for possession or action
New Auto-Interp
Negative Logits
td
0.84
ic
0.80
kowe
0.77
p
0.71
鉤
0.70
Ich
0.69
Bulg
0.66
سلامه
0.66
μεγά
0.65
kli
0.65
POSITIVE LOGITS
ли
0.85
ς
0.70
х
0.70
се
0.69
0.66
ле
0.66
ز
0.66
<0x0D>
0.65
во
0.65
s
0.62
Activations Density 0.004%