INDEX
Explanations
instances of denial or refutation statements
New Auto-Interp
Negative Logits
incinn
-0.79
tnc
-0.76
tn
-0.76
tags
-0.72
inki
-0.71
gone
-0.70
aptic
-0.69
rouse
-0.68
emetery
-0.68
igsaw
-0.66
POSITIVE LOGITS
wrongdoing
1.12
accusations
0.95
allegations
0.91
involvement
0.86
outright
0.83
denying
0.81
charges
0.79
innocence
0.77
responsibility
0.77
paternity
0.76
Activations Density 0.018%