INDEX
Explanations
phrases related to attribution or significance
terms related to violations or significant impacts on rights and integrity
New Auto-Interp
Negative Logits
enne
-0.75
yd
-0.64
enium
-0.63
affected
-0.63
uve
-0.62
fam
-0.62
assies
-0.62
edo
-0.60
engers
-0.60
oqu
-0.59
POSITIVE LOGITS
an
0.98
a
0.97
heresy
0.93
something
0.85
salvation
0.85
betrayal
0.83
blasphemy
0.81
obstruction
0.80
collusion
0.79
complicity
0.79
Activations Density 0.079%