INDEX
Explanations
words related to undercover activities
references to undercover operations and agents
New Auto-Interp
Negative Logits
cing
-0.80
È
-0.74
RR
-0.72
plex
-0.71
UTERS
-0.70
Maw
-0.69
HCR
-0.68
gran
-0.68
hetti
-0.67
士
-0.66
POSITIVE LOGITS
undercover
1.26
informant
1.09
infiltr
0.89
mole
0.88
informants
0.86
sting
0.86
spying
0.80
uncover
0.75
covert
0.75
posing
0.73
Activations Density 0.008%