INDEX
    Explanations

    verbs or phrases indicating information disclosure or lack thereof

    statements related to non-disclosure and specification of information

    New Auto-Interp
    Negative Logits
    joice
    -0.63
    ngth
    -0.61
    jam
    -0.60
    ruction
    -0.60
    oir
    -0.60
    ét
    -0.59
    cosystem
    -0.58
    ãĤ¨
    -0.56
    CHA
    -0.56
     âī
    -0.54
    POSITIVE LOGITS
     specifics
    1.33
     whether
    1.29
     nor
    1.03
    whether
    1.02
     specific
    1.00
     why
    0.99
     particulars
    0.99
     any
    0.92
     how
    0.91
     exact
    0.89
    Act Density 0.142%

    No Known Activations