INDEX
    Explanations

    references to safe and supportive spaces or environments for various communities

    New Auto-Interp
    Negative Logits
     ÙħÙĤاÙħ
    -0.15
    illa
    -0.14
    icot
    -0.14
    ACING
    -0.14
    ahoo
    -0.14
    оÑİ
    -0.13
    ieber
    -0.13
     keyed
    -0.13
    åµ
    -0.13
    orb
    -0.13
    POSITIVE LOGITS
     atmosphere
    0.22
     environment
    0.20
     ortam
    0.16
     аÑĤмоÑģ
    0.16
     stigma
    0.15
     Atmos
    0.15
    สำหร
    0.15
     safe
    0.14
     space
    0.14
    Indented
    0.14
    Act Density 0.084%

    No Known Activations