INDEX
Explanations
statements related to denying involvement or taking steps to prevent certain actions
statements of denial or non-involvement
New Auto-Interp
Negative Logits
hailed
-0.68
ufact
-0.68
nonetheless
-0.65
ptoms
-0.65
surprisingly
-0.64
beware
-0.62
uers
-0.61
plenty
-0.61
realise
-0.61
surv
-0.60
POSITIVE LOGITS
nor
1.21
whatsoever
1.16
anybody
0.96
partisan
0.90
anything
0.87
âĢ
0.87
[
0.85
âĢİ
0.80
â̦"
0.80
prejud
0.78
Activations Density 0.427%