INDEX
Explanations
references to agents in various contexts
New Auto-Interp
Negative Logits
оби
-0.19
erras
-0.18
ara
-0.15
Dump
-0.15
Dive
-0.14
reek
-0.14
ANJI
-0.14
borough
-0.14
ân
-0.14
tte
-0.14
POSITIVE LOGITS
.Agent
0.18
iated
0.15
otts
0.15
inals
0.15
hire
0.15
ees
0.15
urdy
0.14
nesty
0.14
eut
0.14
wire
0.14
Activations Density 0.010%