INDEX
Explanations
denials or refutations in a text
instances of denial or disavowal regarding accusations or claims made against individuals or groups
New Auto-Interp
Negative Logits
ngth
-0.89
soDeliveryDate
-0.78
Scores
-0.75
loo
-0.72
Ec
-0.68
brackets
-0.67
mac
-0.67
TW
-0.66
wal
-0.66
alid
-0.66
POSITIVE LOGITS
existence
0.96
outright
0.95
exist
0.93
legitimacy
0.92
paternity
0.90
validity
0.89
slightest
0.87
consent
0.83
authenticity
0.82
spurious
0.80
Activations Density 0.085%