INDEX
    Explanations

    words related to concentration or attention

    New Auto-Interp
    Negative Logits
    OUGH
    -0.75
    named
    -0.73
    adding
    -0.68
    BIT
    -0.68
    added
    -0.66
    oho
    -0.61
    mia
    -0.61
    mx
    -0.60
    Haunted
    -0.60
    ylon
    -0.60
    POSITIVE LOGITS
    rite
    0.96
     focus
    0.86
     squarely
    0.83
     attention
    0.82
     solely
    0.82
    rals
    0.77
     Attention
    0.76
     toward
    0.75
    ivism
    0.74
     foc
    0.73
    Act Density 0.019%

    No Known Activations