INDEX
    Explanations

    understanding and consideration

    New Auto-Interp
    Negative Logits
    0.96
    ،
    0.90
    0.82
    0.78
    ül
    0.75
    ny
    0.74
    nel
    0.72
    ng
    0.71
    ltry
    0.71
    ,“
    0.71
    POSITIVE LOGITS
    ר
    1.31
    ور
    1.12
    :
    1.02
    1.02
    ي
    1.00
    ב
    0.98
    ות
    0.97
    n
    0.97
    ה
    0.97
    ر
    0.95
    Act Density 0.847%

    No Known Activations