INDEX
Explanations
negative sentiments and expressions of distress
New Auto-Interp
Negative Logits
latter
-0.24
unter
-0.21
Ñı
-0.19
a
-0.18
inx
-0.16
ÑĤого
-0.16
d
-0.16
y
-0.16
e
-0.15
ussian
-0.15
POSITIVE LOGITS
/-
0.20
wards
0.18
webkit
0.15
rc
0.15
ness
0.15
ting
0.15
wealth
0.14
urile
0.14
vest
0.14
etheless
0.14
Activations Density 0.215%