INDEX
Explanations
references to anger and related emotional expressions
New Auto-Interp
Negative Logits
esser
-0.16
以ä¸Ĭ
-0.14
weeney
-0.14
erver
-0.14
elli
-0.14
elling
-0.14
ÐŀлекÑģанд
-0.14
UTO
-0.14
icari
-0.13
interest
-0.13
POSITIVE LOGITS
ulent
0.16
íݸ
0.15
æ´ŀ
0.15
orque
0.15
fulness
0.14
FUL
0.14
Thur
0.14
unker
0.13
/conf
0.13
éĸĢ
0.13
Activations Density 0.040%