INDEX
    Explanations

    specific words related to informative text, signaling transitions or new parts of the text

    New Auto-Interp
    Negative Logits
    illet
    -0.80
    atron
    -0.72
    bled
    -0.72
    ade
    -0.71
    tnc
    -0.70
    iola
    -0.70
    rium
    -0.70
    isable
    -0.69
    enter
    -0.69
    elled
    -0.69
    POSITIVE LOGITS
     acknowledging
    1.19
     researching
    0.99
     conced
    0.94
     browsing
    0.94
     discussing
    0.91
     agreeing
    0.86
     mentioning
    0.84
     dismissing
    0.83
     admitting
    0.83
     respecting
    0.82
    Act Density 0.048%

    No Known Activations