INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    drive
    -0.08
    alisa
    -0.08
    Ma
    -0.08
    -0.07
    shipping
    -0.07
    umers
    -0.07
     drinking
    -0.07
    -fashion
    -0.07
    -0.07
     матер
    -0.07
    POSITIVE LOGITS
     pict
    0.08
     Ged
    0.08
     SPC
    0.08
     ב
    0.08
     SCM
    0.08
     xyz
    0.07
     Penn
    0.07
     وح
    0.07
     Hop
    0.07
     Knight
    0.07
    Act Density 0.003%

    No Known Activations