INDEX
    Explanations

    phrases related to taking action or support for a cause

    New Auto-Interp
    Negative Logits
    Downloadha
    -0.67
     vend
    -0.63
    geries
    -0.63
     similarities
    -0.62
    ibaba
    -0.61
     vulner
    -0.60
    gery
    -0.60
     fung
    -0.60
     unden
    -0.59
     therape
    -0.57
    POSITIVE LOGITS
     guiActive
    0.70
     Yourself
    0.69
     Observer
    0.67
    elle
    0.60
    bleacher
    0.60
    icer
    0.59
    leon
    0.59
     listener
    0.59
    uler
    0.58
    ãĤ¼ãĤ¦ãĤ¹
    0.58
    Act Density 0.068%

    No Known Activations