INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ibur
    -0.79
    verbs
    -0.73
    iop
    -0.72
    kered
    -0.69
    ebook
    -0.68
    flix
    -0.65
    ibu
    -0.65
    Cola
    -0.64
    essee
    -0.63
    ella
    -0.63
    POSITIVE LOGITS
     undone
    0.80
     from
    0.78
    ments
    0.69
     airport
    0.69
    doms
    0.66
     departure
    0.63
    ment
    0.62
     depart
    0.62
    untled
    0.61
     Wast
    0.61
    Act Density 0.026%

    No Known Activations