INDEX
    Explanations

    instances where attention is being discussed or emphasized

    mentions of attention and its implications or effects

    New Auto-Interp
    Negative Logits
     Tale
    -0.70
     Yugoslavia
    -0.65
     Rebell
    -0.63
     Yar
    -0.63
    UES
    -0.62
     Dani
    -0.62
    lins
    -0.61
     halves
    -0.60
     Rouge
    -0.60
     Harbour
    -0.60
    POSITIVE LOGITS
    estinal
    0.88
     spans
    0.87
     span
    0.86
    orial
    0.85
     attention
    0.83
    ively
    0.79
     largeDownload
    0.79
    bender
    0.79
     seeker
    0.78
     seekers
    0.77
    Act Density 0.028%

    No Known Activations