INDEX
Explanations
references to family relationships
mentions of relatives and family connections
New Auto-Interp
Negative Logits
oker
-0.73
ane
-0.72
OUT
-0.71
WM
-0.71
argon
-0.70
GH
-0.70
Effective
-0.70
ebook
-0.70
oted
-0.69
ventory
-0.69
POSITIVE LOGITS
relatives
1.25
ilial
0.87
cousins
0.85
ancestors
0.85
folk
0.82
arrangements
0.82
citiz
0.82
aunt
0.81
siblings
0.80
osponsors
0.79
Activations Density 0.010%