INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Var
    -0.07
    ında
    -0.06
    -0.06
    -0.06
    aver
    -0.06
     tyres
    -0.06
    оль
    -0.06
     arm
    -0.06
     bom
    -0.06
    olla
    -0.06
    POSITIVE LOGITS
     паци
    0.07
     projev
    0.06
    Individual
    0.06
     Helpful
    0.06
    cision
    0.06
     επίσης
    0.06
     tweaked
    0.06
     egregious
    0.06
    ọng
    0.06
     dialogRef
    0.06
    Act Density 0.121%

    No Known Activations