INDEX
    Explanations

    negative descriptors related to conditions or experiences

    New Auto-Interp
    Negative Logits
     Züge
    -0.47
     ciepła
    -0.38
    fromnode
    -0.36
    phosa
    -0.36
     tabung
    -0.35
    lète
    -0.34
     veremos
    -0.34
    Koordinaten
    -0.34
    iyor
    -0.34
    ărilor
    -0.34
    POSITIVE LOGITS
     bad
    1.08
     Bad
    1.02
    Bad
    0.97
     BAD
    0.93
    bad
    0.92
    BAD
    0.85
     badly
    0.65
     mauvais
    0.62
     mauvaise
    0.57
     luck
    0.57
    Act Density 0.084%

    No Known Activations