INDEX
Explanations
references to parental roles and responsibilities
New Auto-Interp
Negative Logits
wner
-0.19
rd
-0.17
ibold
-0.17
werk
-0.15
stants
-0.15
wat
-0.15
↵ ↵ ↵ ↵
-0.15
winter
-0.14
sar
-0.14
walk
-0.14
POSITIVE LOGITS
eral
0.32
-child
0.25
thood
0.22
esco
0.22
親
0.21
erals
0.20
ally
0.19
::__
0.18
age
0.17
ents
0.17
Activations Density 0.045%