INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     prisons
    -0.06
    воз
    -0.06
     Symphony
    -0.06
     harassment
    -0.06
    -0.06
    ovny
    -0.06
    اویر
    -0.06
     Stam
    -0.06
    responses
    -0.06
     hora
    -0.06
    POSITIVE LOGITS
     principal
    0.06
     titan
    0.06
     INTER
    0.06
     رابط
    0.06
     grd
    0.06
    _CONDITION
    0.06
     MARK
    0.06
     Detect
    0.06
     pieces
    0.06
     kh
    0.06
    Act Density 0.010%

    No Known Activations