INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     бал
    -0.07
     sân
    -0.06
    ./(
    -0.06
     Hum
    -0.06
     llegar
    -0.06
     захід
    -0.06
    (*)
    -0.06
     ترك
    -0.06
     Solic
    -0.06
    ตรว
    -0.06
    POSITIVE LOGITS
    (enabled
    0.07
    ($('.
    0.07
     şiddet
    0.06
    (formatter
    0.06
     karş
    0.06
    scenario
    0.06
     BUT
    0.06
    (pg
    0.06
    0.06
    TW
    0.06
    Act Density 0.000%

    No Known Activations