INDEX
Explanations
negative or unfavorable sentiments expressed in the text
New Auto-Interp
Negative Logits
ries
-0.16
osu
-0.15
akov
-0.15
ponsive
-0.15
asper
-0.14
eree
-0.14
irler
-0.14
heimer
-0.14
reira
-0.14
orbit
-0.14
POSITIVE LOGITS
ly
0.21
iously
0.19
LY
0.17
ically
0.16
.ly
0.16
ely
0.15
brace
0.15
fully
0.15
uly
0.14
etheless
0.14
Activations Density 0.917%