INDEX
    Explanations

    phrases related to actions causing change or consequences

    actions related to transformation or change

    New Auto-Interp
    Negative Logits
    ancies
    -0.62
    idence
    -0.59
    inia
    -0.56
     occupancy
    -0.56
    redo
    -0.54
    rise
    -0.54
    dom
    -0.54
    awks
    -0.53
    ajo
    -0.53
    brate
    -0.53
    POSITIVE LOGITS
     hostage
    0.74
    aundering
    0.73
    Ń·
    0.67
    ety
    0.67
    ../
    0.66
     arbitrarily
    0.66
    UC
    0.64
    ĸļ
    0.64
    YING
    0.64
     by
    0.64
    Act Density 0.226%

    No Known Activations