INDEX
Explanations
negations and qualifications present in sentences
negations or denial expressions
New Auto-Interp
Negative Logits
lance
-0.76
plex
-0.73
place
-0.67
plates
-0.67
itor
-0.65
ixel
-0.65
hower
-0.64
iery
-0.63
eteenth
-0.63
(){-0.62
POSITIVE LOGITS
outright
1.07
necessarily
0.89
downright
0.86
icably
0.84
eworthy
0.76
technically
0.74
mistaken
0.70
impossible
0.70
extermin
0.70
intimidated
0.69
Activations Density 0.031%