INDEX
    Explanations

    words indicating proactive behavior or actions

    New Auto-Interp
    Negative Logits
    ujet
    -0.20
    bial
    -0.19
    rub
    -0.18
    bote
    -0.17
    uable
    -0.16
    bil
    -0.16
    midi
    -0.16
    rne
    -0.16
    bis
    -0.16
    b
    -0.16
    POSITIVE LOGITS
    tracted
    0.29
    ffer
    0.28
    actively
    0.27
    pped
    0.26
    ponent
    0.25
    logue
    0.25
    verbs
    0.24
    wl
    0.23
    ccess
    0.23
    gres
    0.23
    Act Density 0.008%

    No Known Activations