INDEX
Explanations
instances where someone is attempting to escape or avoid consequences
phrases about evading consequences or accountability
New Auto-Interp
Negative Logits
wash
-0.79
tenance
-0.75
wake
-0.74
mart
-0.73
link
-0.70
she
-0.68
lehem
-0.68
scape
-0.64
stem
-0.63
main
-0.62
POSITIVE LOGITS
impunity
0.92
murder
0.68
dignity
0.67
aceous
0.64
Mankind
0.63
unequ
0.63
reckless
0.63
standing
0.62
murdering
0.61
Murder
0.60
Activations Density 0.059%