INDEX
    Explanations

    phrases related to accomplishing tasks or decisions

    occurrences of the word "the."

    New Auto-Interp
    Negative Logits
    strate
    -0.73
    wr
    -0.66
    ufact
    -0.65
    Cur
    -0.64
    Style
    -0.64
     exting
    -0.62
     greeted
    -0.61
    tions
    -0.60
    tion
    -0.59
    epad
    -0.59
    POSITIVE LOGITS
     slightest
    1.10
     mistake
    1.09
     leap
    1.01
     pilgrimage
    0.99
     distinction
    0.96
     same
    0.95
     decision
    0.94
     rounds
    0.94
     transition
    0.94
     difference
    0.91
    Act Density 0.038%

    No Known Activations