INDEX
    Explanations

    list separators and questions

    New Auto-Interp
    Negative Logits
     to
    1.02
     is
    1.00
    s
    0.71
     a
    0.69
     with
    0.68
     has
    0.68
    اری
    0.68
    𝑠
    0.66
    0.65
    ند
    0.64
    POSITIVE LOGITS
    д
    1.08
    л
    1.01
    м
    0.98
    ة
    0.94
    ли
    0.87
    o
    0.85
    ه
    0.84
    ى
    0.84
    0.81
    т
    0.79
    Act Density 1.459%

    No Known Activations