INDEX
Explanations
phrases indicating actions that are performed or not performed
instances of the word "did."
New Auto-Interp
Negative Logits
Methods
-0.71
liner
-0.70
Tier
-0.70
Tai
-0.69
washer
-0.68
case
-0.67
oided
-0.67
Handling
-0.66
stood
-0.66
bent
-0.66
POSITIVE LOGITS
actic
0.99
pez
0.88
ĸļ
0.84
not
0.82
confir
0.81
indeed
0.77
oms
0.75
anos
0.74
ppel
0.74
manage
0.74
Activations Density 0.079%