INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ighed
    -1.31
    mbal
    -1.27
    athered
    -1.25
    aret
    -1.23
    ebly
    -1.23
    ḿ
    -1.18
    Theres
    -1.16
    eping
    -1.15
    mogen
    -1.13
    theres
    -1.12
    POSITIVE LOGITS
     is
    3.00
     was
    2.95
     will
    2.55
     has
    2.03
     may
    1.76
     would
    1.66
     could
    1.59
     should
    1.59
     might
    1.59
     shouldn
    1.58
    Act Density 0.147%

    No Known Activations