INDEX
    Explanations

    punishments and warnings

    New Auto-Interp
    Negative Logits
     simplic
    -0.08
     toplam
    -0.08
     totaling
    -0.08
    Tidak
    -0.07
    irani
    -0.07
    zana
    -0.07
     ಪರ
    -0.07
     bont
    -0.07
     confortable
    -0.07
     Gelen
    -0.07
    POSITIVE LOGITS
     deterr
    0.13
     discour
    0.11
     morale
    0.10
     stär
    0.10
     deter
    0.09
     renforcer
    0.08
     encourages
    0.08
     урок
    0.08
     appe
    0.08
     коллектив
    0.08
    Act Density 0.047%

    No Known Activations