INDEX
    Explanations

    inflammatory

    New Auto-Interp
    Negative Logits
     macht
    -0.07
     empleado
    -0.07
     Sons
    -0.07
     الص
    -0.07
     apo
    -0.07
    ¨ط
    -0.06
     suspects
    -0.06
    KT
    -0.06
     Eg
    -0.06
     Select
    -0.06
    POSITIVE LOGITS
    cast
    0.06
    0.06
     pastry
    0.06
    class
    0.06
    Mailer
    0.06
    .centerY
    0.06
    razier
    0.06
     waved
    0.06
     даже
    0.06
     pew
    0.06
    Act Density 0.017%

    No Known Activations