INDEX
Explanations
mentions of violations or violators
words related to violations and offenders
New Auto-Interp
Negative Logits
dress
-0.83
erald
-0.73
mares
-0.73
dash
-0.68
ointment
-0.67
pread
-0.67
guiActiveUnfocused
-0.66
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
-0.63
attrition
-0.62
iewicz
-0.62
POSITIVE LOGITS
viol
1.09
Viol
1.03
viol
1.00
Viol
0.88
amental
0.87
violin
0.86
encia
0.82
ata
0.79
icious
0.78
uous
0.78
Activations Density 0.007%