INDEX
Explanations
references to familial relationships, particularly focusing on sons and daughters
New Auto-Interp
Negative Logits
ibel
-0.20
incinn
-0.17
TURE
-0.17
istring
-0.16
abies
-0.16
ture
-0.15
yor
-0.15
emales
-0.15
ancestor
-0.15
tl
-0.14
POSITIVE LOGITS
-in
0.31
orous
0.28
hood
0.25
eren
0.23
eral
0.21
-IN
0.20
nets
0.19
HO
0.17
less
0.17
ntag
0.17
Activations Density 0.043%