INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.91
    0.84
     is
    0.74
    اد
    0.73
     الك
    0.72
    رك
    0.71
     powied
    0.71
    ал
    0.71
    0.69
     الكتاب
    0.68
    POSITIVE LOGITS
    u
    1.33
    i
    1.05
    an
    0.98
    a
    0.95
    er
    0.91
    ER
    0.88
    ?
    0.87
    d
    0.85
    0.85
    s
    0.76
    Act Density 0.000%

    No Known Activations