INDEX
Explanations
references to familial relationships, specifically fathers and sons
New Auto-Interp
Negative Logits
tures
-0.16
oola
-0.15
rung
-0.15
vat
-0.15
ursal
-0.15
Streams
-0.15
ibel
-0.14
amat
-0.14
tml
-0.14
_firestore
-0.14
POSITIVE LOGITS
hood
0.35
ly
0.31
land
0.29
-da
0.29
-figure
0.26
figure
0.24
ially
0.23
less
0.23
ing
0.23
親
0.22
Activations Density 0.048%