INDEX
    Explanations

    self-attention within sequences

    New Auto-Interp
    Negative Logits
    baş
    0.51
    edged
    0.50
    hanging
    0.47
    başı
    0.46
    kový
    0.46
    ances
    0.46
    spent
    0.45
    pmap
    0.45
    verbs
    0.44
    transistors
    0.43
    POSITIVE LOGITS
    0.47
    0.44
     млад
    0.44
    \\
    0.43
    0.42
    0.42
    0.41
    他にも
    0.40
    ة
    0.40
     cru
    0.40
    Act Density 0.004%

    No Known Activations