INDEX
    Explanations

    words and phrases that indicate attention or interest

    New Auto-Interp
    Negative Logits
    odyn
    -0.16
    omanip
    -0.16
    ameda
    -0.15
    icas
    -0.14
    incinn
    -0.14
    -BEGIN
    -0.14
    озв
    -0.14
    iliz
    -0.14
    itol
    -0.14
    avn
    -0.13
    POSITIVE LOGITS
     attention
    0.65
    attention
    0.51
     att
    0.44
     attent
    0.43
     Attention
    0.42
     notice
    0.42
     ATT
    0.40
     внимание
    0.39
    Attention
    0.39
     attn
    0.38
    Act Density 0.070%

    No Known Activations