INDEX
    Explanations

    expressions of harm related to LGBTQ+ issues

    New Auto-Interp
    Negative Logits
    pur
    -0.15
    lots
    -0.15
    ç«ĭãģ¦
    -0.15
    IFS
    -0.14
    ÏģοÏį
    -0.14
    ifs
    -0.14
    ilage
    -0.14
    ipes
    -0.13
    n
    -0.13
    orton
    -0.13
    POSITIVE LOGITS
    926
    0.14
    еди
    0.14
     borderTop
    0.14
     è»
    0.13
    /documents
    0.13
     RN
    0.13
     вÑģÑĤ
    0.13
    .Startup
    0.13
    ีà¸Ĭ
    0.13
    EDI
    0.13
    Act Density 0.018%

    No Known Activations