INDEX
    Explanations

    informal dialogue

    expressions related to emotional harm or offense.

    New Auto-Interp
    Negative Logits
     IST
    -0.07
    /ad
    -0.06
     distractions
    -0.06
    Fcn
    -0.06
    -0.06
    [idx
    -0.06
    -0.06
     accordance
    -0.06
     بأ
    -0.06
    -0.06
    POSITIVE LOGITS
     revers
    0.07
     dang
    0.06
     cat
    0.06
     numb
    0.06
    reds
    0.06
    0.06
     Dayton
    0.06
    pard
    0.06
     shaving
    0.06
     Hezbollah
    0.06
    Act Density 0.044%

    No Known Activations