INDEX
Explanations
proper nouns
the word "had" in various contexts
New Auto-Interp
Negative Logits
bie
-0.66
âϦ
-0.65
bery
-0.65
orph
-0.64
PI
-0.64
alias
-0.64
coin
-0.62
owe
-0.61
forward
-0.60
hack
-0.60
POSITIVE LOGITS
been
1.11
undergone
1.03
iths
1.01
begun
0.97
gotten
0.91
previously
0.91
gone
0.91
originally
0.90
hoped
0.87
flown
0.79
Activations Density 0.156%