INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    v
    1.18
    c
    0.97
    p
    0.95
    :
    0.93
     can
    0.93
    can
    0.91
    thern
    0.90
    ca
    0.89
    kan
    0.89
    ali
    0.88
    POSITIVE LOGITS
    لية
    1.33
    ل
    1.05
    1.02
    اتي
    1.00
     árv
    0.98
    سين
    0.96
    لل
    0.96
    ني
    0.95
    ták
    0.95
    باد
    0.94
    Act Density 0.001%

    No Known Activations