INDEX
    Explanations

    lunch break or conversation

    New Auto-Interp
    Negative Logits
    an
    1.78
    on
    1.60
    in
    1.32
    u
    1.27
    en
    1.25
    ap
    1.04
    st
    1.01
    com
    1.01
    model
    1.00
    it
    0.99
    POSITIVE LOGITS
    :
    1.14
    ي
    1.14
    не
    1.12
    ع
    1.05
     for
    1.03
    м
    1.03
    يات
    1.02
    з
    1.02
    ك
    1.02
    й
    1.00
    Act Density 0.003%

    No Known Activations