INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     up
    -2.52
     Up
    -1.67
    Up
    -1.51
     upto
    -1.49
    up
    -1.39
    upto
    -1.27
    jusqu
    -1.12
     UP
    -1.08
     lên
    -1.08
     يتيمه
    -1.03
    POSITIVE LOGITS
     to
    0.97
     the
    0.89
    ,
    0.83
    !
    0.82
     and
    0.79
    .
    0.77
     a
    0.76
    -
    0.75
     or
    0.73
     for
    0.69
    Act Density 0.085%

    No Known Activations