INDEX
Explanations
references to law enforcement or police-related incidents
New Auto-Interp
Negative Logits
————
-0.80
module
-0.78
along
-0.74
Proced
-0.73
asus
-0.68
anni
-0.67
anon
-0.66
idency
-0.66
emale
-0.66
qqa
-0.64
POSITIVE LOGITS
slightest
0.97
curs
0.86
watered
0.81
smallest
0.77
slight
0.74
superf
0.74
modest
0.73
mundane
0.71
trivial
0.70
occasional
0.69
Activations Density 0.922%