INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    rouse
    -0.77
    âĨij
    -0.68
    ierrez
    -0.67
    Nich
    -0.67
    inks
    -0.64
     Brach
    -0.64
    æ©
    -0.63
    Yan
    -0.63
     Replacement
    -0.63
     Higgins
    -0.62
    POSITIVE LOGITS
    cffffcc
    0.92
     released
    0.88
     replaced
    0.87
     deprecated
    0.86
     updated
    0.85
     hijacked
    0.82
     indicted
    0.82
     awhile
    0.82
     upgraded
    0.81
     moved
    0.81
    Act Density 0.045%

    No Known Activations