INDEX
Explanations
words related to law enforcement officers
references to agents involved in various contexts
New Auto-Interp
Negative Logits
issance
-0.94
lihood
-0.76
PUT
-0.71
nings
-0.70
cept
-0.69
yss
-0.68
nz
-0.68
etooth
-0.68
FACE
-0.65
tty
-0.64
POSITIVE LOGITS
agent
1.24
agents
1.20
Agent
1.07
agent
1.04
agents
1.04
Agents
1.01
prov
0.93
ilage
0.89
Agent
0.84
Sov
0.83
Activations Density 0.018%