INDEX
Explanations
mentions of alleged or related allegations
terms related to accusations or allegations
New Auto-Interp
Negative Logits
OVA
-0.77
live
-0.75
bern
-0.72
ajo
-0.71
vet
-0.71
haar
-0.71
eyes
-0.70
patch
-0.70
ben
-0.70
ARCH
-0.70
POSITIVE LOGITS
allegations
0.90
accuser
0.82
accusations
0.81
disclosures
0.76
alleged
0.76
abuses
0.75
misrepresent
0.73
abuser
0.73
accused
0.73
alleges
0.72
Activations Density 0.015%