INDEX
Explanations
instances of the word "had."
New Auto-Interp
Negative Logits
PI
-0.67
bery
-0.66
owe
-0.65
orph
-0.64
ethy
-0.64
ety
-0.63
âϦ
-0.62
bie
-0.60
anymore
-0.59
forward
-0.59
POSITIVE LOGITS
been
1.01
iths
1.01
originally
0.98
previously
0.97
begun
0.97
hoped
0.95
undergone
0.94
initially
0.83
gotten
0.82
flown
0.79
Activations Density 0.139%