INDEX
Explanations
violations of laws or regulations
phrases related to violations of laws or rights
New Auto-Interp
Negative Logits
onna
-0.86
mad
-0.82
affer
-0.75
deb
-0.74
venture
-0.69
FORE
-0.67
issue
-0.67
timer
-0.66
rikes
-0.66
ael
-0.66
POSITIVE LOGITS
unfocusedRange
0.96
orius
0.76
İĭ
0.74
violating
0.68
ingly
0.67
seless
0.66
violations
0.65
Behavior
0.64
stereotypes
0.64
Nielsen
0.63
Activations Density 0.080%