INDEX
    Explanations

    words related to violence or harm

    New Auto-Interp
    Negative Logits
    ogle
    -0.15
    تÙĬÙĨ
    -0.15
    ãĤ¤ãĥ¤
    -0.14
    aan
    -0.14
    thro
    -0.14
    анÑģи
    -0.14
    UTOR
    -0.13
    USTER
    -0.13
    992
    -0.13
    anych
    -0.13
    POSITIVE LOGITS
     Presidency
    0.15
    emoc
    0.15
     RegexOptions
    0.15
     FG
    0.14
    WithName
    0.14
    //{{
    0.14
     Chairman
    0.14
     Fay
    0.14
    izzare
    0.14
    istribute
    0.14
    Act Density 0.048%

    No Known Activations