INDEX
Explanations
negations or instances where actions are not being taken
negative constructions and phrases indicating lack of action or responsibility
New Auto-Interp
Negative Logits
CST
-0.72
eers
-0.70
soDeliveryDate
-0.70
Mehran
-0.70
Valhalla
-0.69
Cth
-0.67
Geh
-0.65
Tant
-0.65
Tribune
-0.64
Ninth
-0.63
POSITIVE LOGITS
accepting
0.94
paying
0.93
pleting
0.93
iating
0.93
angering
0.93
rewarding
0.90
agreeing
0.90
committing
0.89
having
0.89
engaging
0.88
Activations Density 0.189%