INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     welf
    -0.66
    xual
    -0.66
     unsupported
    -0.62
     DRAG
    -0.62
     compe
    -0.61
     laun
    -0.61
     hypot
    -0.58
     caution
    -0.58
     otherwise
    -0.58
     bearer
    -0.57
    POSITIVE LOGITS
    uggets
    1.21
    ovation
    1.14
    ihil
    1.10
    aturally
    1.10
    ucle
    1.07
    umerous
    1.05
    vironment
    1.05
    orthern
    1.04
    ounced
    1.01
    onsense
    0.99
    Act Density 0.047%

    No Known Activations