INDEX
    Explanations

    terms related to moderation or alleviation of conditions or factors

    New Auto-Interp
    Negative Logits
    ness
    -0.87
    iness
    -0.84
    celotti
    -0.78
    ings
    -0.71
     whor
    -0.69
     Ches
    -0.69
    w
    -0.68
    innerText
    -0.66
    est
    -0.66
    iverr
    -0.65
    POSITIVE LOGITS
    uate
    1.05
    ATED
    0.98
    uminate
    0.98
    ated
    0.98
    ViewFeatures
    0.95
    cated
    0.94
     themſelves
    0.93
    ating
    0.90
     myſelf
    0.89
    IVATE
    0.87
    Act Density 0.524%

    No Known Activations