INDEX
    Explanations

    phrases indicating caution or warnings

    mentions of being careful or cautious

    New Auto-Interp
    Negative Logits
    upon
    -0.80
    IRE
    -0.66
    heid
    -0.64
    MH
    -0.64
    flat
    -0.64
    ono
    -0.62
    hung
    -0.61
    obo
    -0.60
    soon
    -0.60
    olon
    -0.60
    POSITIVE LOGITS
     lest
    1.12
     calibr
    0.80
     selecting
    0.69
     when
    0.69
    rored
    0.69
     interpreting
    0.69
    ^^
    0.67
     regarding
    0.66
    ogical
    0.66
     stewards
    0.65
    Act Density 0.078%

    No Known Activations