INDEX
Explanations
the presence of the word "don't."
New Auto-Interp
Negative Logits
wrapper
-0.63
antioxid
-0.61
Reborn
-0.61
Reloaded
-0.60
language
-0.60
Houses
-0.59
ancest
-0.58
Greenberg
-0.58
stabilized
-0.58
capacities
-0.58
POSITIVE LOGITS
hesitate
1.11
bother
1.07
forget
1.07
underestimate
0.94
worry
0.89
expect
0.88
discriminate
0.84
erest
0.84
Í
0.81
intend
0.80
Activations Density 0.046%