INDEX
    Explanations

    creating new content or narratives

    New Auto-Interp
    Negative Logits
     on
    1.05
    ول
    1.03
    с
    1.02
    ור
    0.96
    ра
    0.96
     to
    0.94
     as
    0.94
    0.90
    ри
    0.89
    ے
    0.88
    POSITIVE LOGITS
    in
    1.77
    u
    1.70
    t
    1.62
    ar
    1.41
    z
    1.34
    b
    1.33
    is
    1.28
    k
    1.21
    ad
    1.11
    w
    1.09
    Act Density 0.085%

    No Known Activations