INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ;border
    -0.07
    schlie
    -0.07
    prefer
    -0.07
    _exc
    -0.07
     lesbisk
    -0.07
     Xiao
    -0.06
    /win
    -0.06
    עמי
    -0.06
    likely
    -0.06
    neo
    -0.06
    POSITIVE LOGITS
    0.07
     מוצרים
    0.07
    าร
    0.07
    jaw
    0.07
     Dx
    0.07
    0.06
    0.06
    vn
    0.06
     לעומת
    0.06
     "'
    0.06
    Act Density 0.029%

    No Known Activations