INDEX
Explanations
phrases related to legal allegations
instances of claims or accusations being made
New Auto-Interp
Negative Logits
cffffcc
-0.86
ammy
-0.74
heartedly
-0.65
screening
-0.64
outing
-0.63
spot
-0.61
talk
-0.60
Unch
-0.59
outed
-0.57
captcha
-0.55
POSITIVE LOGITS
ments
0.89
alleges
0.87
reau
0.83
lements
0.78
MENT
0.76
zynski
0.76
TAIN
0.75
cing
0.75
allege
0.74
olate
0.73
Activations Density 0.048%