INDEX
Explanations
positive opinions or sentiments
expressions of belief or opinion
New Auto-Interp
Negative Logits
è¦ļéĨĴ
-0.74
destro
-0.69
adra
-0.66
dictates
-0.66
aughters
-0.66
×Ļ
-0.66
untled
-0.64
sidx
-0.64
ç«
-0.63
WER
-0.63
POSITIVE LOGITS
innocuous
0.73
joking
0.72
kindred
0.72
invincible
0.71
gonna
0.68
harmless
0.66
funny
0.66
unbeat
0.65
kidding
0.65
cute
0.64
Activations Density 0.179%