INDEX
    Explanations

    phrases related to actions being taken or accepted

    instances of the word "taken" in various contexts

    New Auto-Interp
    Negative Logits
    eers
    -0.72
     reinforcement
    -0.62
    tions
    -0.62
    glers
    -0.61
    vine
    -0.60
     SPD
    -0.60
    cers
    -0.59
    ternity
    -0.59
    ichick
    -0.57
    ileaks
    -0.56
    POSITIVE LOGITS
     aback
    1.53
     advantage
    1.10
     care
    1.07
    aways
    1.03
     seriously
    0.98
     apart
    0.91
     hostage
    0.90
     away
    0.89
     orally
    0.84
     awa
    0.82
    Act Density 0.036%

    No Known Activations