INDEX
Explanations
instances of police-related actions and compliance with law enforcement commands
New Auto-Interp
Negative Logits
akis
-0.16
aby
-0.16
icone
-0.15
ailles
-0.15
architekt
-0.15
weathermap
-0.15
undermin
-0.15
enin
-0.14
/apis
-0.14
irth
-0.14
POSITIVE LOGITS
slowly
0.23
Slow
0.22
cooperation
0.22
step
0.21
Freeze
0.20
complied
0.20
cooper
0.20
slow
0.20
freeze
0.19
vac
0.19
Activations Density 0.050%