INDEX
Explanations
descriptions of people's family backgrounds
phrases indicating familial relationships or lineage
New Auto-Interp
Negative Logits
mble
-0.75
nels
-0.74
upid
-0.71
ickr
-0.71
imar
-0.69
nox
-0.67
CAR
-0.67
ostic
-0.66
ordan
-0.66
-0.66
POSITIVE LOGITS
hers
0.72
deceased
0.71
elder
0.70
Brother
0.69
sisters
0.67
clergy
0.67
Apostle
0.67
brothers
0.67
twins
0.66
grandmother
0.65
Activations Density 0.091%