INDEX
Explanations
references to family members
mentions of parental figures or relationships
New Auto-Interp
Negative Logits
hikers
-0.71
intensity
-0.70
uve
-0.66
owler
-0.65
clusions
-0.63
IFF
-0.63
alist
-0.62
tension
-0.62
xual
-0.62
sqor
-0.61
POSITIVE LOGITS
ancest
0.83
died
0.82
divorced
0.82
uncle
0.79
ma
0.79
ancestor
0.76
ancestors
0.73
uncle
0.73
hood
0.72
husband
0.72
Activations Density 0.087%