INDEX
Explanations
family relationships, specifically siblings
references to familial relationships, particularly focusing on brothers and sisters
New Auto-Interp
Negative Logits
ifact
-0.70
acent
-0.70
pmwiki
-0.66
isting
-0.65
Population
-0.65
issions
-0.64
eneg
-0.64
ancing
-0.63
agnetic
-0.63
erers
-0.62
POSITIVE LOGITS
hood
1.52
ly
0.91
brothers
0.89
Nath
0.81
Uriel
0.80
patriarch
0.79
brother
0.77
liness
0.76
beard
0.75
brother
0.74
Activations Density 0.039%