INDEX
Explanations
references to familial relationships, particularly focusing on brothers and siblings
New Auto-Interp
Negative Logits
})));
-0.80
Lupin
-0.77
glan
-0.75
DispatchToProps
-0.74
Judson
-0.73
smut
-0.73
USERS
-0.72
ANSI
-0.72
TPM
-0.72
NPs
-0.72
POSITIVE LOGITS
brother
2.26
BROTHER
2.17
brothers
2.15
Brother
2.07
brother
2.07
Brother
2.03
Brothers
1.94
sister
1.88
brothers
1.83
Brothers
1.75
Activations Density 0.052%