INDEX
    Explanations

    phrases that introduce explanations or additional information

    statements that include the phrase "which is," indicating clarification or elaboration

    New Auto-Interp
    Negative Logits
    actory
    -0.82
    rongh
    -0.74
    otte
    -0.71
    ievers
    -0.68
    ecake
    -0.67
    iating
    -0.66
    icators
    -0.65
    Numbers
    -0.65
    igraph
    -0.64
    ependence
    -0.63
    POSITIVE LOGITS
     why
    1.27
     admittedly
    1.13
     understandable
    1.04
     basically
    1.02
     presumably
    0.98
     ironic
    0.95
     supposed
    0.95
     essentially
    0.90
     probably
    0.89
     obviously
    0.89
    Act Density 0.121%

    No Known Activations