INDEX
Explanations
pronouns referring to individuals in various languages
New Auto-Interp
Negative Logits
OGND
-0.99
estekak
-0.85
pleaſure
-0.71
NSCoder
-0.69
Personendaten
-0.68
دانشنامهٔ
-0.67
fevere
-0.63
greateſt
-0.62
itſelf
-0.62
térm
-0.62
POSITIVE LOGITS
he
0.98
she
0.92
She
0.86
He
0.82
THEY
0.82
они
0.80
they
0.80
I
0.77
वह
0.74
он
0.74
Activations Density 0.137%