INDEX
    Explanations

    sentence end indicators followed by "but"

    New Auto-Interp
    Negative Logits
     methodologies
    0.35
     activations
    0.32
     deliverables
    0.31
     analogs
    0.31
     datasets
    0.31
     applications
    0.30
     alphan
    0.30
    Wrapper
    0.29
     analogues
    0.29
     benchmarks
    0.29
    POSITIVE LOGITS
    you
    0.40
    but
    0.39
    why
    0.39
     Wasn
    0.39
     didn
    0.38
     That
    0.38
     wouldn
    0.38
    yes
    0.37
    who
    0.37
    maybe
    0.37
    Act Density 0.103%

    No Known Activations