INDEX
Explanations
the word "not" followed by a strong emphasis on the subsequent word or phrase
negations or phrases indicating refusal
New Auto-Interp
Negative Logits
å¥
-0.85
ounters
-0.73
cano
-0.69
æ©
-0.67
ãĤ¼
-0.65
Hots
-0.65
ously
-0.65
peaks
-0.64
oise
-0.63
çļ
-0.63
POSITIVE LOGITS
bothered
1.29
ashamed
1.25
afraid
1.24
interested
1.22
gonna
1.21
kidding
1.21
aware
1.10
necessarily
1.10
fooled
1.06
worried
1.05
Activations Density 0.112%