INDEX
    Explanations

    instances of ridicule and criticism, particularly related to gender and social issues

    New Auto-Interp
    Negative Logits
    ieres
    -0.16
    ä¾Ľ
    -0.15
    agina
    -0.15
    ãĤ¤ãĤ¯
    -0.14
    ystone
    -0.14
    lio
    -0.14
    iš
    -0.14
    bolt
    -0.14
     Sel
    -0.14
    инÑĥ
    -0.14
    POSITIVE LOGITS
     repro
    0.17
    helm
    0.17
     lamp
    0.15
     lamb
    0.15
     about
    0.15
    aca
    0.14
    幸
    0.14
    queries
    0.14
    Mock
    0.14
     daring
    0.14
    Act Density 0.189%

    No Known Activations