INDEX
    Explanations

    discipline or punishment

    New Auto-Interp
    Negative Logits
     sinh
    -0.07
     Tanks
    -0.07
    -0.07
    	dc
    -0.06
     Plan
    -0.06
    ursos
    -0.06
    社會
    -0.06
     подв
    -0.06
     plan
    -0.06
    기의
    -0.06
    POSITIVE LOGITS
     Не
    0.07
    classCallCheck
    0.06
    odus
    0.06
    igmoid
    0.06
    Не
    0.06
     Λ
    0.06
    always
    0.06
     willingly
    0.06
     Rav
    0.06
     Athen
    0.06
    Act Density 0.073%

    No Known Activations