INDEX
Explanations
mentions of family relationships, particularly the word "cousin"
references to familial relationships, specifically terms related to cousins
New Auto-Interp
Negative Logits
inth
-0.84
hner
-0.79
inen
-0.75
inem
-0.74
anwhile
-0.73
yss
-0.70
ights
-0.70
overe
-0.69
Ö¼
-0.69
arching
-0.66
POSITIVE LOGITS
cousin
0.93
cousins
0.92
aunt
0.89
uncle
0.89
nephew
0.87
niece
0.86
hesis
0.85
hood
0.79
incest
0.73
uncle
0.71
Activations Density 0.010%