INDEX
    Explanations

    phrases emphasizing concentration or direct attention

    New Auto-Interp
    Negative Logits
    ishly
    -0.20
    ish
    -0.19
    orce
    -0.17
    aly
    -0.17
    iggers
    -0.16
    ity
    -0.16
    hiba
    -0.15
    eu
    -0.15
    utan
    -0.15
    een
    -0.15
    POSITIVE LOGITS
     attention
    0.23
     areas
    0.22
    SED
    0.21
     area
    0.20
     point
    0.20
     Areas
    0.19
    (es
    0.18
    point
    0.18
    -area
    0.18
     shifted
    0.18
    Act Density 0.041%

    No Known Activations