INDEX
    Explanations

    expressions of criticism about societal or systemic issues

    New Auto-Interp
    Negative Logits
     unlimited
    -0.15
     tả
    -0.15
    иÑģÑĮ
    -0.14
    orman
    -0.14
    ko
    -0.14
    Projection
    -0.14
     Champ
    -0.14
    kiye
    -0.13
     nonexistent
    -0.13
    projection
    -0.13
    POSITIVE LOGITS
    aju
    0.16
     moment
    0.15
    andr
    0.15
    igon
    0.15
    acci
    0.15
    amar
    0.15
    gesi
    0.14
    /inc
    0.14
    coni
    0.14
    etc
    0.14
    Act Density 0.014%

    No Known Activations