INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ي
    0.94
    י
    0.84
    ก็
    0.81
    0.81
    بي
    0.81
    ه
    0.75
    0.75
     it
    0.73
    מ
    0.73
    س
    0.72
    POSITIVE LOGITS
    in
    0.88
     appears
    0.84
     appear
    0.82
    im
    0.80
    is
    0.73
     Appear
    0.71
    রা
    0.71
    at
    0.68
    appears
    0.68
    il
    0.66
    Act Density 0.018%

    No Known Activations