INDEX
    Explanations

    expressions related to societal critiques and personal accountability

    New Auto-Interp
    Negative Logits
    agen
    -0.17
    urette
    -0.15
    Ĭ
    -0.15
    Yep
    -0.15
    .metro
    -0.15
    imore
    -0.14
     Yup
    -0.14
     Nope
    -0.14
    crew
    -0.14
     Yep
    -0.14
    POSITIVE LOGITS
     ALSO
    0.17
    also
    0.17
     also
    0.17
     Also
    0.16
    Also
    0.15
     yo
    0.15
     thems
    0.15
    fine
    0.14
     hi
    0.14
     Ñĩа
    0.14
    Act Density 0.173%

    No Known Activations