INDEX
    Explanations

    terms related to political entities or actions

    instances of the placeholder token, which suggests it is looking for structural or formatting elements in the text

    New Auto-Interp
    Negative Logits
     welf
    -0.62
     Rover
    -0.61
     Wem
    -0.61
     checkpoints
    -0.60
     heights
    -0.60
     wip
    -0.59
     streak
    -0.57
     beginnings
    -0.57
    gypt
    -0.57
     Emerson
    -0.56
    POSITIVE LOGITS
    venient
    1.42
    secut
    1.39
    cerned
    1.34
    stant
    1.33
    cern
    1.31
    crete
    1.30
    ventional
    1.29
    cept
    1.28
    verted
    1.27
    structed
    1.27
    Act Density 0.033%

    No Known Activations