INDEX
    Explanations

    phrases related to past actions or events

    New Auto-Interp
    Negative Logits
    bie
    -0.62
    PI
    -0.62
    âϦ
    -0.61
    bery
    -0.60
    hack
    -0.60
    owe
    -0.57
    hammer
    -0.57
     Voters
    -0.57
     trope
    -0.56
    etting
    -0.56
    POSITIVE LOGITS
     been
    1.35
     undergone
    1.15
     begun
    1.15
     gone
    1.10
     gotten
    1.09
    iths
    1.07
    been
    1.02
     previously
    0.98
     flown
    0.94
     taken
    0.91
    Act Density 0.651%

    No Known Activations