INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    2.55
    з
    2.03
    ı
    1.89
    zäh
    1.88
     Надо
    1.80
    1.73
     pâle
    1.66
    ا
    1.63
    いや
    1.62
    1.62
    POSITIVE LOGITS
    l
    2.42
    oughby
    2.20
     tetapi
    1.95
    اً
    1.80
    🔥🔥
    1.74
    lly
    1.65
    AY
    1.64
    tedir
    1.64
     favorably
    1.64
     répon
    1.63
    Act Density 0.484%

    No Known Activations