INDEX
    Explanations

    phrases describing positive impact or success

    New Auto-Interp
    Negative Logits
    apache
    -0.76
    isco
    -0.73
    apons
    -0.72
    mares
    -0.71
    js
    -0.71
    each
    -0.70
     exceeds
    -0.70
    these
    -0.68
     existed
    -0.67
     exists
    -0.67
    POSITIVE LOGITS
     easiest
    1.15
     same
    1.13
     simplest
    1.07
     extent
    1.07
     safest
    1.06
     biggest
    1.05
     gist
    1.04
     toughest
    1.01
     strongest
    1.01
     hallmark
    1.00
    Act Density 0.136%

    No Known Activations