INDEX
Explanations
contractions of "would not" or "could not" in text
expressions of negation or refusal
New Auto-Interp
Negative Logits
Cosponsors
-0.64
«ĺ
-0.64
ishing
-0.62
ÃŁ
-0.62
è¦ļéĨĴ
-0.60
\<
-0.59
active
-0.58
Bench
-0.57
xxxx
-0.57
Fil
-0.56
POSITIVE LOGITS
ĸļ
0.80
tumble
0.70
offend
0.69
dearly
0.64
«
0.64
oola
0.63
fitting
0.62
unthinkable
0.62
itate
0.61
schild
0.61
Activations Density 0.219%