INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     proactive
    -0.08
     proactively
    -0.08
    yny
    -0.07
    orton
    -0.07
     pener
    -0.07
    gens
    -0.07
    illustr
    -0.07
    Specified
    -0.07
    īn
    -0.07
    IID
    -0.07
    POSITIVE LOGITS
     weakened
    0.11
     corrupted
    0.11
     fragmented
    0.11
     impaired
    0.10
     amidst
    0.10
     muff
    0.10
     whispered
    0.10
    0.10
     gar
    0.10
     malformed
    0.09
    Act Density 0.010%

    No Known Activations