INDEX
    Explanations

    phrases related to attention or awareness

    New Auto-Interp
    Negative Logits
    hist
    -0.15
    747
    -0.14
    ILLA
    -0.14
    ÙħÙĪÙĦ
    -0.13
    redo
    -0.13
    ichni
    -0.13
    yme
    -0.13
    esus
    -0.13
    .INSTANCE
    -0.13
    chie
    -0.13
    POSITIVE LOGITS
     attention
    1.36
     Attention
    1.16
    attention
    1.16
    Attention
    1.04
     atención
    0.84
     внимание
    0.84
     attent
    0.83
    _attention
    0.80
     вним
    0.72
    注æĦı
    0.68
    Act Density 0.221%

    No Known Activations