INDEX
Explanations
personal pronouns in different languages
New Auto-Interp
Negative Logits
seamnă
-0.65
そういった
-0.51
Normdatei
-0.50
Савезне
-0.48
ureusement
-0.47
quenching
-0.46
Arund
-0.46
betical
-0.46
Portale
-0.46
intrusions
-0.45
POSITIVE LOGITS
he
1.84
He
1.58
she
1.57
He
1.52
он
1.48
his
1.45
Он
1.38
但他
1.37
himself
1.37
they
1.35
Activations Density 0.062%