INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     compilers
    -0.06
    _dense
    -0.06
    frog
    -0.06
     Carr
    -0.06
    uses
    -0.06
     côt
    -0.06
    pane
    -0.06
    larıyla
    -0.06
    (tweet
    -0.06
    _fil
    -0.06
    POSITIVE LOGITS
     EZ
    0.07
     TO
    0.07
    .SOCK
    0.07
     Recognition
    0.07
    GX
    0.07
     plung
    0.06
     lying
    0.06
     LIB
    0.06
    izik
    0.06
    _SELECTOR
    0.06
    Act Density 0.001%

    No Known Activations