INDEX
    Explanations

    phrases signaling comparison or contrast

    references to the concept of "which" as it pertains to explanations or clarifications in the text

    New Auto-Interp
    Negative Logits
    Behind
    -0.76
    grim
    -0.73
    let
    -0.68
    rior
    -0.68
    Roaming
    -0.62
    bug
    -0.62
     Ott
    -0.62
    hat
    -0.60
    da
    -0.60
    lean
    -0.60
    POSITIVE LOGITS
    soever
    0.90
     consisted
    0.75
    akespeare
    0.75
     corresponds
    0.74
     originated
    0.73
     consists
    0.73
     exceeds
    0.73
     lasted
    0.71
     constitutes
    0.71
     resulted
    0.71
    Act Density 0.035%

    No Known Activations