INDEX
Explanations
mentions of siblings or related family terms
references to siblings and family relationships
New Auto-Interp
Negative Logits
protected
-0.69
vasive
-0.67
usc
-0.65
iled
-0.64
industrial
-0.63
inent
-0.62
Paris
-0.62
tical
-0.60
Gore
-0.60
rophe
-0.60
POSITIVE LOGITS
siblings
1.34
sibling
0.93
adolesc
0.87
iblings
0.86
heses
0.85
nodd
0.81
ilial
0.81
twins
0.81
ystem
0.81
cousins
0.81
Activations Density 0.011%