INDEX
Explanations
proper nouns related to familial relationships (e.g., Father, John)
references to father figures and paternal relationships
New Auto-Interp
Negative Logits
mble
-0.79
ellen
-0.73
externalToEVAOnly
-0.68
EVA
-0.67
pace
-0.65
hijab
-0.63
Women
-0.63
burgh
-0.62
lled
-0.62
ibr
-0.61
POSITIVE LOGITS
hood
1.15
volent
0.99
patriarch
0.98
father
0.89
figure
0.84
hetical
0.82
hesis
0.82
liest
0.77
land
0.76
less
0.72
Activations Density 0.063%