INDEX
    Explanations

    phrases indicating contrast/consequences

    em dashes and phrases that connect ideas or indicate ongoing thoughts

    New Auto-Interp
    Negative Logits
     obser
    -0.84
     lifes
    -0.77
     sleeper
    -0.77
     subsequ
    -0.73
     cons
    -0.73
    omorphic
    -0.72
    ccording
    -0.72
    itton
    -0.71
    ĵĺ
    -0.71
     stabil
    -0.70
    POSITIVE LOGITS
    ––
    0.87
     ---
    0.85
    _-
    0.84
    [[
    0.81
    ————
    0.81
     WATCHED
    0.81
     Cosponsors
    0.81
    âĸº
    0.80
     âĢķ
    0.80
    SEE
    0.79
    Act Density 0.031%

    No Known Activations