INDEX
Explanations
the word "don't"
the presence of the contraction "don't."
New Auto-Interp
Negative Logits
itiz
-0.70
Penguin
-0.66
Presence
-0.66
spirited
-0.63
Reloaded
-0.61
Alleg
-0.61
Sparrow
-0.60
Reviewer
-0.59
Laun
-0.58
çĦ
-0.58
POSITIVE LOGITS
necessarily
1.19
bother
1.07
know
1.04
hesitate
1.01
discriminate
0.99
deserve
0.98
appreciate
0.96
seem
0.95
intend
0.93
belong
0.93
Activations Density 0.097%