INDEX
Explanations
instances of the word "did."
the word "did" and its variations indicating actions performed
New Auto-Interp
Negative Logits
case
-0.68
liner
-0.68
stood
-0.68
washer
-0.66
Methods
-0.66
Handling
-0.66
fields
-0.65
bent
-0.65
arb
-0.64
Offense
-0.63
POSITIVE LOGITS
actic
1.04
pez
0.97
confir
0.81
oms
0.81
manage
0.81
indeed
0.79
not
0.79
ĸļ
0.78
ppel
0.74
nt
0.74
Activations Density 0.077%