INDEX
Explanations
denial statements
instances of denial regarding accusations or allegations
New Auto-Interp
Negative Logits
incinn
-0.87
GROUP
-0.78
emetery
-0.77
Else
-0.73
rouse
-0.72
================================
-0.70
aptic
-0.69
clone
-0.69
Ranked
-0.69
ARCH
-0.68
POSITIVE LOGITS
denies
0.89
deny
0.82
vehemently
0.80
outright
0.79
denial
0.78
wrongdoing
0.78
extradition
0.76
denied
0.75
admission
0.74
denying
0.74
Activations Density 0.025%