INDEX
    Explanations

    phrases regarding attention and engagement

    New Auto-Interp
    Negative Logits
    apo
    -0.14
    venir
    -0.14
    üçük
    -0.14
     omas
    -0.14
    887
    -0.14
    éal
    -0.13
    antu
    -0.13
    itespace
    -0.13
    ancial
    -0.12
    odyn
    -0.12
    POSITIVE LOGITS
     attention
    1.04
    attention
    0.88
     Attention
    0.84
    Attention
    0.76
     attent
    0.65
     atención
    0.63
    _attention
    0.63
     внимание
    0.61
    注æĦı
    0.55
     attn
    0.53
    Act Density 0.177%

    No Known Activations