INDEX
Explanations
proper nouns
instances of the word "had"
New Auto-Interp
Negative Logits
coin
-0.64
âϦ
-0.63
PI
-0.63
bie
-0.60
alias
-0.59
—-
-0.59
Gi
-0.58
hack
-0.58
*****
-0.58
bery
-0.58
POSITIVE LOGITS
been
1.12
undergone
1.08
iths
1.02
begun
0.97
previously
0.96
originally
0.94
gone
0.90
gotten
0.86
taken
0.83
flown
0.83
Activations Density 0.160%