INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    r
    2.02
    a
    1.90
    c
    1.48
    ى
    1.44
    .
    1.37
    b
    1.33
    n
    1.28
    z
    1.25
    f
    1.21
    v
    1.20
    POSITIVE LOGITS
    م
    1.08
     pleads
    1.00
    У
    1.00
     flocks
    0.99
    فير
    0.98
    А
    0.91
     کي
    0.91
     to
    0.90
     plea
    0.90
     футболдук
    0.90
    Act Density 0.005%

    No Known Activations