INDEX
Explanations
references to parent-child relationships
New Auto-Interp
Negative Logits
rd
-0.17
wner
-0.16
wives
-0.15
lÃŃÄį
-0.15
ibold
-0.15
rades
-0.15
sar
-0.14
ço
-0.14
calar
-0.14
↵ ↵ ↵ ↵
-0.14
POSITIVE LOGITS
eral
0.35
age
0.28
-child
0.28
esco
0.24
erals
0.23
thood
0.22
-da
0.21
親
0.21
::__
0.20
ially
0.20
Activations Density 0.053%