INDEX
Explanations
mentions of relationships, particularly between spouses or partners
New Auto-Interp
Negative Logits
berger
-0.16
masturb
-0.16
ancestor
-0.15
rella
-0.15
amedi
-0.15
puberty
-0.15
Boys
-0.15
ancestor
-0.15
ruk
-0.14
ancestral
-0.14
POSITIVE LOGITS
wife
0.93
husband
0.80
Wife
0.79
wife
0.77
wives
0.75
-wife
0.73
spouse
0.73
Husband
0.68
spouses
0.67
妻
0.66
Activations Density 0.548%