INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     uży
    0.91
     keluar
    0.86
    СК
    0.82
    มี
    0.75
    0.72
    0.72
     as
    0.71
     kullanıcı
    0.71
    0.71
     çıktı
    0.70
    POSITIVE LOGITS
    im
    1.02
    k
    0.86
    g
    0.82
    c
    0.80
    x
    0.79
    acly
    0.72
    τίας
    0.71
    m
    0.71
    ina
    0.70
    pire
    0.69
    Act Density 0.002%

    No Known Activations