INDEX
Explanations
references to spy-related themes and elements
New Auto-Interp
Negative Logits
elsen
-0.15
ç°
-0.14
manslaughter
-0.14
arel
-0.14
åħ¬åijĬ
-0.14
ılıp
-0.14
_bs
-0.13
законодав
-0.13
hir
-0.13
discriminator
-0.13
POSITIVE LOGITS
CIA
0.41
agent
0.38
spy
0.37
agents
0.36
Agents
0.35
-agent
0.35
Agent
0.34
spies
0.33
intelligence
0.33
agent
0.32
Activations Density 0.230%