INDEX
Explanations
mentions of the word “youngest”
references to age, specifically the youngest individuals in various contexts
New Auto-Interp
Negative Logits
fe
-0.67
works
-0.63
similar
-0.63
embed
-0.62
redirect
-0.60
neutral
-0.60
ject
-0.57
User
-0.56
guarantees
-0.56
fab
-0.55
POSITIVE LOGITS
youngest
3.75
eldest
2.77
oldest
2.40
younger
1.81
young
1.57
tallest
1.57
newest
1.54
Younger
1.54
daughters
1.50
smallest
1.49
Activations Density 0.007%