INDEX
    Explanations

    words related to personal or societal actions or behaviors

    references to "actions" and their implications or consequences

    New Auto-Interp
    Negative Logits
    bid
    -0.89
    anus
    -0.75
    ļéĨĴ
    -0.70
    skinned
    -0.69
    vu
    -0.65
    weights
    -0.63
     Dwell
    -0.63
     comma
    -0.62
    ickets
    -0.61
    inately
    -0.61
    POSITIVE LOGITS
    uate
    1.01
    ives
    0.94
     actions
    0.92
    terday
    0.91
    uits
    0.90
    hops
    0.88
    uations
    0.87
     undertaken
    0.86
    ivism
    0.86
     ACTIONS
    0.86
    Act Density 0.060%

    No Known Activations