INDEX
    Explanations

    phrases related to behaviors or actions

    instances of the word "actions" in relation to moral responsibility or consequences

    New Auto-Interp
    Negative Logits
    bid
    -0.73
    BLE
    -0.65
     AES
    -0.65
    mbuds
    -0.64
     Dise
    -0.64
    orf
    -0.62
    inately
    -0.62
    ondo
    -0.61
    used
    -0.61
    definition
    -0.60
    POSITIVE LOGITS
    uations
    1.05
     ACTIONS
    0.99
    uate
    0.98
     actions
    0.95
    uated
    0.89
    uation
    0.89
    uary
    0.86
    hops
    0.85
    uating
    0.84
    ives
    0.81
    Act Density 0.025%

    No Known Activations