INDEX
Explanations
mentions of family relationships, particularly siblings
the term "brother" and its variations in context
New Auto-Interp
Negative Logits
acent
-0.74
ifact
-0.72
ocobo
-0.67
eneg
-0.66
erers
-0.66
ACA
-0.64
veyard
-0.64
issions
-0.64
tten
-0.64
Population
-0.63
POSITIVE LOGITS
hood
1.49
brothers
0.91
ly
0.89
Nath
0.82
patriarch
0.80
brother
0.78
liness
0.77
brother
0.77
Brother
0.75
Uriel
0.74
Activations Density 0.039%