INDEX
    Explanations

    phrases indicating clarification or explanation

    conditional statements and qualifications in the text

    New Auto-Interp
    Negative Logits
     awoken
    -0.76
    anded
    -0.72
    accompan
    -0.65
    handled
    -0.62
    fed
    -0.61
    footed
    -0.61
    figured
    -0.59
    oru
    -0.57
     voic
    -0.57
    ļéĨĴ
    -0.57
    POSITIVE LOGITS
     necessarily
    0.88
     exaggeration
    0.86
     anymore
    0.81
     exagger
    0.76
     lightly
    0.71
     dissu
    0.71
    imilar
    0.71
    azes
    0.69
     discouraged
    0.69
     anything
    0.68
    Act Density 0.414%

    No Known Activations