INDEX
Explanations
references to parental roles and relationships
New Auto-Interp
Negative Logits
rd
-0.19
wives
-0.16
lÃŃÄį
-0.16
亮
-0.15
bite
-0.15
wife
-0.14
wright
-0.14
768
-0.14
winter
-0.14
wner
-0.14
POSITIVE LOGITS
eral
0.35
-child
0.29
age
0.25
erals
0.24
esco
0.24
親
0.23
-da
0.23
ially
0.21
thood
0.20
/gr
0.20
Activations Density 0.050%