INDEX
Explanations
the word "Agent" or related variations
references to agents, especially in a context related to intelligence or espionage
New Auto-Interp
Negative Logits
lihood
-1.07
issance
-0.94
FACE
-0.75
etooth
-0.75
ths
-0.74
ĸļ
-0.71
orld
-0.70
phrine
-0.69
cept
-0.66
nz
-0.65
POSITIVE LOGITS
prov
0.99
agent
0.90
Agent
0.88
agent
0.87
sov
0.84
agents
0.79
agents
0.78
inates
0.78
iola
0.75
anova
0.75
Activations Density 0.034%