INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     fu
    -0.08
    998
    -0.08
                                    
    -0.08
     قو
    -0.08
    -0.08
     uncont
    -0.07
    öch
    -0.07
     مع
    -0.07
     Zu
    -0.07
     قاب
    -0.07
    POSITIVE LOGITS
    ones
    0.08
     Lanc
    0.08
    hire
    0.08
     Myself
    0.08
    aint
    0.07
    -j
    0.07
     Verge
    0.07
     abrupt
    0.07
    oral
    0.07
     ці
    0.07
    Act Density 0.099%

    No Known Activations