INDEX
    Explanations

    words related to attention or emphasis

    New Auto-Interp
    Negative Logits
    ania
    -0.80
    asca
    -0.75
    idden
    -0.70
    named
    -0.68
    OUGH
    -0.67
    ston
    -0.65
    mia
    -0.62
    ccess
    -0.62
     paran
    -0.62
    ibrary
    -0.61
    POSITIVE LOGITS
    starter
    0.88
    rite
    0.86
    Goal
    0.82
     squarely
    0.82
     focuses
    0.80
    ivation
    0.80
     focus
    0.79
    fulness
    0.78
     focused
    0.78
     toward
    0.74
    Act Density 0.025%

    No Known Activations