INDEX
    Explanations

    phrases related to taking action or exerting control in a forceful manner

    New Auto-Interp
    Negative Logits
     IMAGES
    -0.77
    lihood
    -0.74
    enegger
    -0.73
     abund
    -0.68
     Nations
    -0.64
     Values
    -0.63
    xual
    -0.62
     Plenty
    -0.62
    IVES
    -0.62
     Aires
    -0.61
    POSITIVE LOGITS
    tered
    1.53
    tering
    1.40
    ters
    1.07
    down
    1.04
    tle
    1.04
    downs
    1.02
     down
    0.99
    outs
    0.91
    out
    0.90
    ulence
    0.90
    Act Density 0.023%

    No Known Activations