INDEX
Explanations
negative sentiments or phrases expressing disapproval
New Auto-Interp
Negative Logits
tones
-0.08
본
-0.08
itzer
-0.07
blogs
-0.07
bot
-0.07
ÃŃsk
-0.07
_else
-0.07
æİĴåIJį
-0.07
á»ijc
-0.07
ึ
-0.07
POSITIVE LOGITS
relief
0.08
Relief
0.07
'
0.06
ayout
0.06
no
0.06
evidence
0.06
silver
0.06
‘
0.06
place
0.06
winners
0.06
Activations Density 0.014%