INDEX
    Explanations

    phrases expressing intent or meaning

    phrases that assert disclaimers or qualifications

    New Auto-Interp
    Negative Logits
    ku
    -0.77
     awoken
    -0.67
    busters
    -0.67
    anded
    -0.64
     bonds
    -0.63
    knit
    -0.63
    dt
    -0.62
     Commands
    -0.61
    locked
    -0.60
    than
    -0.58
    POSITIVE LOGITS
     necessarily
    0.82
     nor
    0.80
     anymore
    0.80
     exaggeration
    0.79
     disrespect
    0.77
    hesda
    0.76
     anyone
    0.76
     lightly
    0.75
     discouraged
    0.72
     anybody
    0.72
    Act Density 0.226%

    No Known Activations