INDEX
    Explanations

    bracketed text or placeholders

    New Auto-Interp
    Negative Logits
    ية
    0.78
    не
    0.76
     (.)
    0.68
    0.61
    0.58
    ى
    0.57
    0.57
    0.56
    0.56
    0.56
    POSITIVE LOGITS
    !]
    1.19
    ?]
    1.18
    ל
    1.09
    ,]
    1.03
    י
    0.89
    +]
    0.86
    ()]
    0.86
    ب
    0.86
     ]
    0.85
    ED
    0.84
    Act Density 0.190%

    No Known Activations