INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    с
    0.59
    س
    0.49
    0.46
    "
    0.44
    I
    0.44
    0.44
    0.44
    0.41
    ని
    0.41
    0.40
    POSITIVE LOGITS
    x
    0.68
    ad
    0.62
    is
    0.61
    ul
    0.52
    at
    0.52
    an
    0.51
    ق
    0.50
    0.49
    xv
    0.48
    xk
    0.48
    Act Density 4.525%

    No Known Activations