INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ار
    1.59
    w
    1.57
    il
    1.55
    ad
    1.45
    am
    1.45
    ates
    1.42
    ING
    1.38
    m
    1.38
    v
    1.38
    1.37
    POSITIVE LOGITS
    د
    2.00
    이나
    1.80
     וכ
    1.66
    ي
    1.66
    ពេល
    1.64
     وبعد
    1.52
    ى
    1.48
    ل
    1.46
    ный
    1.40
    에는
    1.38
    Act Density 0.033%

    No Known Activations