INDEX
Explanations
mentions of family members, particularly sisters, in various contexts
mentions of the word "sister."
New Auto-Interp
Negative Logits
ulkan
-0.71
ateurs
-0.64
beit
-0.64
imal
-0.63
vt
-0.62
urable
-0.61
holes
-0.61
animate
-0.60
ambo
-0.60
hematically
-0.60
POSITIVE LOGITS
sister
3.63
sisters
2.53
brother
2.39
sibling
2.30
niece
2.03
cousin
2.02
Sister
1.96
daughter
1.95
siblings
1.85
aunt
1.71
Activations Density 0.007%