INDEX
    Explanations

    words related to impactful actions or events

    actions that lead to significant consequences or changes

    New Auto-Interp
    Negative Logits
    ug
    -0.69
    ombat
    -0.68
    oha
    -0.62
    =-=-
    -0.62
    igger
    -0.60
    ogo
    -0.60
    û
    -0.60
    available
    -0.59
    ique
    -0.59
    peg
    -0.57
    POSITIVE LOGITS
    hler
    0.65
     thereby
    0.62
    angelo
    0.61
    arks
    0.60
     contributions
    0.60
    cially
    0.59
    ãĥ¥
    0.58
     winds
    0.58
     indu
    0.57
     compos
    0.57
    Act Density 0.135%

    No Known Activations