INDEX
Explanations
references to family members, particularly sons
mentions of the word "son."
New Auto-Interp
Negative Logits
veyard
-0.70
freight
-0.66
Union
-0.65
iculty
-0.64
PORT
-0.63
icial
-0.59
ords
-0.59
manifold
-0.59
accommodations
-0.59
TING
-0.58
POSITIVE LOGITS
Gohan
0.97
hood
0.91
nets
0.88
hesis
0.87
ogram
0.87
pins
0.83
hetically
0.83
mares
0.79
son
0.79
Barron
0.78
Activations Density 0.025%