INDEX
    Explanations

    explanations or clarifications in text

    instances of the word "explained."

    New Auto-Interp
    Negative Logits
    inal
    -0.80
    illet
    -0.76
     ILCS
    -0.71
    engeance
    -0.70
    venge
    -0.69
    pired
    -0.69
    vernment
    -0.69
     contracted
    -0.66
    otion
    -0.66
    ascus
    -0.65
    POSITIVE LOGITS
     explains
    1.01
     why
    0.95
     WHY
    0.90
     explained
    0.87
     explain
    0.83
    ĸļ
    0.83
     explaining
    0.82
    WER
    0.82
     explanations
    0.81
     Explain
    0.80
    Act Density 0.017%

    No Known Activations