INDEX
    Explanations

    phrases related to non-responsiveness or denials

    New Auto-Interp
    Negative Logits
     righteousness
    -0.75
     injust
    -0.70
     Rebellion
    -0.68
     mediocre
    -0.68
     Empires
    -0.66
     proportions
    -0.65
     Levels
    -0.65
     Conversation
    -0.65
     Ribbon
    -0.64
     horizont
    -0.63
    POSITIVE LOGITS
     disclose
    1.11
     disclosed
    1.08
     specify
    1.06
     formally
    1.03
     divul
    0.94
     mention
    0.91
     elabor
    0.91
     disclosing
    0.91
     necessarily
    0.90
     explain
    0.90
    Act Density 0.088%

    No Known Activations