INDEX
    Explanations

    mentions of verbs that describe some change in state or position

    New Auto-Interp
    Negative Logits
    wrote
    -1.52
    took
    -1.46
     withdrew
    -1.45
    grew
    -1.42
     froze
    -1.41
     flew
    -1.41
     wore
    -1.41
    threw
    -1.39
    knew
    -1.39
     undertook
    -1.37
    POSITIVE LOGITS
     taken
    1.10
     given
    1.02
     shown
    0.89
     done
    0.86
     seen
    0.86
     a
    0.82
     to
    0.74
     come
    0.73
     in
    0.73
     up
    0.72
    Act Density 4.355%

    No Known Activations