INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    become
    -1.38
     Become
    -1.27
    been
    -1.25
     become
    -1.21
     Efq
    -1.11
    Been
    -1.10
     itſelf
    -1.10
     anún
    -1.09
     myſelf
    -1.07
     deviennent
    -1.06
    POSITIVE LOGITS
     a
    1.19
     an
    1.00
     the
    0.93
     "
    0.87
     “
    0.85
     able
    0.82
     part
    0.79
    0.79
     more
    0.78
     '
    0.75
    Act Density 0.165%

    No Known Activations