INDEX
Explanations
mentions of the word "father."
mentions of the word "father."
New Auto-Interp
Negative Logits
ample
-0.70
pots
-0.65
enda
-0.65
waves
-0.63
adas
-0.61
boxes
-0.61
ways
-0.60
ably
-0.59
yz
-0.58
Redux
-0.58
POSITIVE LOGITS
father
3.45
dad
2.73
fathers
2.58
grandfather
2.33
mother
2.32
Father
2.17
Father
2.16
father
2.10
dads
2.10
parents
2.06
Activations Density 0.021%