INDEX
    Explanations

    phrases indicating changes or improvements in various contexts

    New Auto-Interp
    Negative Logits
     heavier
    -0.18
     heavily
    -0.16
     harder
    -0.16
     worse
    -0.16
    ardash
    -0.14
     worsening
    -0.14
     heavy
    -0.13
    ÙĬÙĪÙĨ
    -0.13
    erdale
    -0.13
    agini
    -0.13
    POSITIVE LOGITS
    sign
    0.33
     Sign
    0.31
     SIGN
    0.31
     sign
    0.30
    SIGN
    0.26
    stant
    0.24
    Sign
    0.24
     apprec
    0.23
    -sign
    0.23
     Dram
    0.23
    Act Density 0.152%

    No Known Activations