INDEX
    Explanations

    words related to causing or triggering a specific outcome or state

    terms associated with the concept of induction or causing an effect

    New Auto-Interp
    Negative Logits
    achu
    -0.70
    apest
    -0.69
    asketball
    -0.69
    iffin
    -0.67
    lain
    -0.65
    =-=-=-=-=-=-=-=-
    -0.65
    roud
    -0.65
    phan
    -0.64
    andon
    -0.64
    mis
    -0.64
    POSITIVE LOGITS
     induced
    1.14
     induce
    1.00
     inducing
    0.94
     induction
    0.92
     induces
    0.87
    induced
    0.87
     analges
    0.85
    uced
    0.85
    uces
    0.84
    untarily
    0.84
    Act Density 0.012%

    No Known Activations