INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     alış
    -0.07
    -0.07
    -0.07
    ided
    -0.07
    的帮助下
    -0.07
     hear
    -0.07
    -0.07
     persuaded
    -0.07
    -0.07
    bett
    -0.07
    POSITIVE LOGITS
    гран
    0.09
    0.07
    schließen
    0.07
    0.07
     Painting
    0.07
    演艺
    0.07
     khoảng
    0.07
    flower
    0.07
    酱油
    0.06
     umbrella
    0.06
    Act Density 0.021%

    No Known Activations