INDEX
Explanations
possessive pronouns indicating ownership or belonging
New Auto-Interp
Negative Logits
himself
-0.80
himself
-0.61
infarction
-0.56
которому
-0.55
xk
-0.52
is
-0.52
فيلم
-0.51
Obispo
-0.51
phalt
-0.50
stenosis
-0.49
POSITIVE LOGITS
their
1.94
Their
1.87
Their
1.77
their
1.74
THEIR
1.72
themselves
1.54
thier
1.47
leurs
1.34
themselves
1.34
Leur
1.24
Activations Density 0.093%