INDEX
    Explanations

    words related to unity and collective action

    New Auto-Interp
    Negative Logits
     himself
    -0.93
     herself
    -0.78
    ãĥĹ
    -0.60
     lucrative
    -0.57
     reportedly
    -0.57
     disliked
    -0.55
     frowned
    -0.55
     revoked
    -0.54
     his
    -0.53
     infuri
    -0.52
    POSITIVE LOGITS
     ourselves
    1.83
     our
    1.71
    Our
    1.70
     Our
    1.64
     we
    1.58
     We
    1.55
    We
    1.54
     OUR
    1.53
    we
    1.44
     ours
    1.32
    Act Density 0.937%

    No Known Activations