INDEX
    Explanations

    phrases indicating the start or initiation of an action or process

    calls to action or invitations to engage in a task

    New Auto-Interp
    Negative Logits
    iership
    -0.72
    ires
    -0.65
    lied
    -0.65
    amiya
    -0.62
    ELD
    -0.61
     veins
    -0.61
    KO
    -0.61
    elson
    -0.60
     externalToEVAOnly
    -0.59
    rays
    -0.58
    POSITIVE LOGITS
     ourselves
    1.19
     recap
    0.77
    eeee
    0.75
     briefly
    0.72
    reality
    0.71
     facts
    0.70
     pretend
    0.69
     hindsight
    0.69
     analogy
    0.69
     our
    0.69
    Act Density 0.116%

    No Known Activations