INDEX
    Explanations

    instances where "pay" or "paying attention" is mentioned

    New Auto-Interp
    Negative Logits
    anka
    -0.08
    MN
    -0.07
    ILED
    -0.07
     æ¿
    -0.07
    chen
    -0.07
    plex
    -0.07
    redicate
    -0.07
    ogl
    -0.06
    rip
    -0.06
    ouz
    -0.06
    POSITIVE LOGITS
    ırak
    0.07
    skirts
    0.07
     closely
    0.07
     carefully
    0.07
    OWL
    0.06
    459
    0.06
    agus
    0.06
    789
    0.06
    oulouse
    0.06
     attent
    0.06
    Act Density 0.004%

    No Known Activations