INDEX
    Explanations

    phrases related to paying attention

    phrases emphasizing the importance of paying attention

    New Auto-Interp
    Negative Logits
     halves
    -0.79
    tre
    -0.70
     hunt
    -0.66
    riot
    -0.65
    ods
    -0.65
    versions
    -0.63
    wi
    -0.61
     Townsend
    -0.61
    CS
    -0.61
     Mehran
    -0.60
    POSITIVE LOGITS
     attention
    0.86
    othal
    0.79
    arios
    0.76
    ibly
    0.75
    aceutical
    0.72
    estinal
    0.71
    ibility
    0.71
     Attention
    0.71
    escription
    0.67
    Reply
    0.67
    Act Density 0.016%

    No Known Activations