INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    uming
    -0.08
    camp
    -0.07
    '''
    -0.07
     Slight
    -0.07
    sx
    -0.07
    -we
    -0.07
    ,h
    -0.07
    :::
    -0.07
    ultan
    -0.07
    Sc
    -0.07
    POSITIVE LOGITS
     منهم
    0.11
     منها
    0.09
     qa
    0.09
     poput
    0.08
     duplicates
    0.08
     Guns
    0.08
    ציג
    0.08
     outright
    0.08
     доступны
    0.08
     قي
    0.08
    Act Density 0.059%

    No Known Activations