INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ה
    0.98
    ת
    0.87
    ه
    0.83
    ל
    0.81
    0.81
    s
    0.80
    א
    0.77
    ים
    0.71
    0.69
    ك
    0.66
    POSITIVE LOGITS
     Polo
    0.95
     polo
    0.86
    polo
    0.80
     you
    0.73
     polos
    0.72
     is
    0.64
    meric
    0.64
     آنا
    0.61
     chevaux
    0.60
    </b>
    0.59
    Act Density 0.001%

    No Known Activations