INDEX
Explanations
negative expressions or sentiments
New Auto-Interp
Negative Logits
N
-0.17
h
-0.16
980
-0.16
ivr
-0.16
biên
-0.16
jt
-0.15
H
-0.15
q
-0.15
Ìĥ
-0.14
Z
-0.14
POSITIVE LOGITS
rog
0.16
sWith
0.15
Us
0.15
mite
0.15
oft
0.15
ecd
0.15
bole
0.14
erial
0.14
ador
0.14
gable
0.14
Activations Density 0.071%