INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     nicely
    -0.09
     neatly
    -0.08
    تحسين
    -0.07
     rebuilding
    -0.07
     Drinking
    -0.07
     ninth
    -0.07
     Building
    -0.07
     Cheat
    -0.07
    -0.07
    -0.07
    POSITIVE LOGITS
    *,
    0.09
    
    0.07
    bla
    0.07
     maxi
    0.07
    *
    0.07
    0.06
    ably
    0.06
    0.06
     ')
    0.06
    manız
    0.06
    Act Density 0.023%

    No Known Activations