INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Vand
    -0.08
     Overse
    -0.08
    pe
    -0.07
     لح
    -0.07
    lse
    -0.07
     Sparse
    -0.07
     Horse
    -0.07
     sce
    -0.06
     yavaş
    -0.06
    ."),
    -0.06
    POSITIVE LOGITS
    ัก
    0.08
    l
    0.08
    i
    0.08
    1
    0.07
    all
    0.07
    I
    0.07
    ali
    0.07
    _I
    0.07
    (al
    0.07
     I
    0.07
    Act Density 0.057%

    No Known Activations