INDEX
    Explanations

    instances of attention and focus in the text

    New Auto-Interp
    Negative Logits
    obel
    -0.16
    iazza
    -0.14
    stoff
    -0.14
    KERNEL
    -0.14
    uyá»ĩn
    -0.14
    lops
    -0.14
    änder
    -0.13
    /autoload
    -0.13
    isse
    -0.13
    PTY
    -0.13
    POSITIVE LOGITS
     We
    0.16
    endir
    0.15
     intact
    0.15
    unker
    0.15
     Dev
    0.15
    ometrics
    0.14
     Bent
    0.14
    enance
    0.14
     My
    0.14
    ude
    0.13
    Act Density 0.009%

    No Known Activations