INDEX
    Explanations

    phrases related to directing attention towards a specific topic or action

    phrases related to prioritizing attention or resources

    New Auto-Interp
    Negative Logits
    named
    -0.74
    OUGH
    -0.74
    BIT
    -0.72
    adding
    -0.71
    mia
    -0.71
    ania
    -0.68
    added
    -0.67
    idden
    -0.63
    mx
    -0.63
    anne
    -0.62
    POSITIVE LOGITS
    rite
    0.95
     focus
    0.86
     squarely
    0.82
    peed
    0.79
     attention
    0.79
     solely
    0.77
     toward
    0.77
     focused
    0.76
     foc
    0.75
     Attention
    0.74
    Act Density 0.031%

    No Known Activations