INDEX
Explanations
the word "like" being used in a negative context
instances of the phrase "don't like" in various contexts
New Auto-Interp
Negative Logits
alm
-0.80
vantage
-0.77
utical
-0.77
ositories
-0.76
TAIN
-0.75
rontal
-0.74
hern
-0.73
elin
-0.73
chin
-0.72
abetic
-0.72
POSITIVE LOGITS
lihood
1.07
nor
0.88
anything
0.83
liness
0.83
anybody
0.82
anymore
0.79
ably
0.77
liest
0.75
surprises
0.73
any
0.72
Activations Density 0.034%