INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    in
    1.63
    p
    1.55
    م
    1.50
    ה
    1.40
    ل
    1.38
    1.31
    ל
    1.29
    a
    1.27
    ع
    1.25
    ם
    1.18
    POSITIVE LOGITS
     semi
    1.11
    -
    1.02
    ↵↵
    0.88
    áról
    0.84
    ian
    0.83
    0.80
    rives
    0.77
     дали
    0.77
    <0xA8>
    0.76
    ull
    0.76
    Act Density 0.004%

    No Known Activations