INDEX
Explanations
son of parents or ancestors
New Auto-Interp
Negative Logits
unborn
-0.10
erate
-0.10
YPE
-0.09
ä¸Ģç§į
-0.09
student
-0.08
непÑĢи
-0.08
onic
-0.08
å£
-0.08
YP
-0.08
ype
-0.08
POSITIVE LOGITS
parents
0.18
Parents
0.14
Parents
0.14
parents
0.13
bitch
0.11
immigrants
0.11
ither
0.10
abusive
0.10
union
0.10
minor
0.09
Activations Density 0.074%