INDEX
    Explanations

    references to agents in various contexts

    New Auto-Interp
    Negative Logits
    оби
    -0.19
    erras
    -0.18
    ara
    -0.15
    Dump
    -0.15
     Dive
    -0.14
    reek
    -0.14
    ANJI
    -0.14
    borough
    -0.14
    ân
    -0.14
    tte
    -0.14
    POSITIVE LOGITS
    .Agent
    0.18
    iated
    0.15
    otts
    0.15
    inals
    0.15
    hire
    0.15
    ees
    0.15
    urdy
    0.14
    nesty
    0.14
    eut
    0.14
    wire
    0.14
    Act Density 0.010%

    No Known Activations