INDEX
Explanations
mentions or references to allegations or accusations
New Auto-Interp
Negative Logits
scape
-0.71
rain
-0.68
hn
-0.67
ool
-0.65
owa
-0.64
Prep
-0.61
Optim
-0.61
alike
-0.60
arity
-0.60
alone
-0.59
POSITIVE LOGITS
alleged
3.55
purported
2.17
allegedly
2.07
accused
1.89
alleges
1.88
allege
1.85
allegation
1.82
suspected
1.79
allegations
1.74
purportedly
1.61
Activations Density 0.015%