INDEX
    Explanations

    words related to enhancement or improvement

    phrases indicating a positive contribution or improvement

    New Auto-Interp
    Negative Logits
    avior
    -0.82
     Advertisement
    -0.74
    AAAAAAAA
    -0.64
    aneous
    -0.62
    cz
    -0.61
    irus
    -0.60
    DH
    -0.60
    unn
    -0.60
    CDC
    -0.60
     apologized
    -0.59
    POSITIVE LOGITS
     humankind
    0.74
    otos
    0.73
     srf
    0.72
     accompany
    0.67
    axy
    0.67
    tones
    0.66
    ggles
    0.66
     compensate
    0.66
     mankind
    0.66
    ound
    0.65
    Act Density 0.133%

    No Known Activations