INDEX
    Explanations

    sentences where a statement is made followed by a subsequent clarification or additional information

    phrases that indicate contradictory or nuanced statements

    New Auto-Interp
    Negative Logits
    hiba
    -0.77
     sidx
    -0.73
     sqor
    -0.69
    everal
    -0.66
     Cosponsors
    -0.65
    aminer
    -0.64
    roxy
    -0.64
     Uriel
    -0.62
    perty
    -0.61
     Quote
    -0.60
    POSITIVE LOGITS
     necessarily
    1.22
     anymore
    1.04
     anything
    1.00
    nor
    0.99
     nor
    0.94
     magically
    0.90
     any
    0.88
     ANY
    0.80
     infall
    0.79
    anything
    0.79
    Act Density 0.299%

    No Known Activations