INDEX
    Explanations

    phrases that indicate important considerations or reminders

    phrases that emphasize the importance of consideration or awareness

    New Auto-Interp
    Negative Logits
    urated
    -0.71
    aired
    -0.71
    vous
    -0.66
    ibur
    -0.65
    thro
    -0.60
    ping
    -0.60
    idal
    -0.59
    gui
    -0.59
    ãĥı
    -0.58
    Xi
    -0.58
    POSITIVE LOGITS
     lest
    0.92
     caveats
    0.84
     beware
    0.71
     caveat
    0.71
    ³³³
    0.70
     disclaimer
    0.69
     that
    0.68
     though
    0.68
     WHY
    0.65
     secondly
    0.65
    Act Density 0.123%

    No Known Activations