INDEX
Explanations
references to violent incidents involving law enforcement
New Auto-Interp
Negative Logits
udoku
-0.18
785
-0.17
имв
-0.15
/Area
-0.14
984
-0.14
aterno
-0.14
rophy
-0.14
.eclipse
-0.14
ascar
-0.14
acman
-0.14
POSITIVE LOGITS
0.16
isha
0.15
resist
0.15
igen
0.15
agger
0.15
/****************************************************************************↵
0.15
Err
0.14
fuse
0.14
Resist
0.14
irsch
0.13
Activations Density 0.020%