INDEX
Explanations
references to a specific individual or entity, particularly focusing on possessive pronouns and direct mentions of that individual
New Auto-Interp
Negative Logits
Houſe
-0.70
auffi
-0.70
raiſ
-0.69
itſelf
-0.69
uſe
-0.67
cauſe
-0.67
myſelf
-0.67
pleaſure
-0.66
Theodo
-0.64
Aphrodite
-0.64
POSITIVE LOGITS
his
3.85
his
3.09
His
2.77
His
2.70
HIS
2.44
himself
2.40
him
2.37
彼の
2.33
他的
2.24
he
2.23
Activations Density 0.235%