INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1
    -2.69
    7
    -2.59
    2
    -2.52
    6
    -2.42
    5
    -2.38
    4
    -2.38
    3
    -2.31
    8
    -2.22
     and
    -2.17
    0
    -2.08
    POSITIVE LOGITS
     a
    3.25
     not
    2.94
     now
    2.52
     also
    2.11
     Which
    2.05
     an
    2.03
     usually
    1.98
     interpreta
    1.98
     morfo
    1.94
     just
    1.93
    Act Density 0.114%

    No Known Activations