INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    aja
    -0.07
    259
    -0.07
    (cursor
    -0.07
     dwar
    -0.07
    129
    -0.06
    owel
    -0.06
    ระบ
    -0.06
    коп
    -0.06
    xcb
    -0.06
    ンの
    -0.06
    POSITIVE LOGITS
    etrain
    0.06
     strawberries
    0.06
     Visible
    0.06
     Golf
    0.06
     Mention
    0.06
    0.06
     Clearly
    0.06
     beige
    0.06
     GOLD
    0.06
    =wx
    0.06
    Act Density 0.007%

    No Known Activations