INDEX
    Explanations

    instances of negative sentiments or expressions

    New Auto-Interp
    Negative Logits
    volution
    -0.57
    قایناق‌لار
    -0.54
     بتاريخ
    -0.53
    rops
    -0.52
     انت
    -0.51
    <eos>
    -0.51
     nahilalakip
    -0.49
    )))),
    -0.48
    }]);
    -0.48
    воз
    -0.48
    POSITIVE LOGITS
    SBATCH
    0.87
     OnTrigger
    0.72
    Personensuche
    0.70
    unnitel
    0.69
    Autoritní
    0.67
    kheim
    0.67
    extAlignment
    0.66
     thumbs
    0.64
    gridx
    0.62
    ########.
    0.60
    Act Density 0.046%

    No Known Activations