INDEX
Explanations
mentions of family members, particularly brothers
mentions of siblings, specifically brothers
New Auto-Interp
Negative Logits
Population
-0.73
argon
-0.73
ifact
-0.72
Effect
-0.70
protected
-0.69
pmwiki
-0.67
USE
-0.66
aceous
-0.62
mberg
-0.62
issions
-0.62
POSITIVE LOGITS
hood
1.32
brother
1.22
brother
1.06
brothers
1.05
heses
0.95
nephew
0.94
friend
0.93
hesis
0.90
uncle
0.88
sibling
0.88
Activations Density 0.010%