INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     listen
    -2.27
     listens
    -2.19
     listened
    -2.06
     listening
    -2.03
     Listen
    -1.96
    Listen
    -1.95
    listening
    -1.77
    listen
    -1.75
     Listening
    -1.72
     listeners
    -1.65
    POSITIVE LOGITS
     to
    0.93
    /
    0.62
     and
    0.59
    esen
    0.58
    int
    0.55
    ener
    0.53
    -
    0.51
     M
    0.51
    ,
    0.51
    bee
    0.50
    Act Density 0.110%

    No Known Activations