INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    其他
    0.84
    0.79
    0.78
    0.75
    0.73
    0.69
    д
    0.68
    ات
    0.68
    لي
    0.68
    ב
    0.68
    POSITIVE LOGITS
    the
    1.09
    il
    0.88
    s
    0.81
    ri
    0.80
    0
    0.72
    they
    0.64
    are
    0.64
    ir
    0.63
    was
    0.63
    ut
    0.61
    Act Density 0.015%

    No Known Activations