INDEX
    Explanations

    phrases related to confidentiality and information control

    New Auto-Interp
    Negative Logits
     monot
    -0.15
    755
    -0.14
     Hunger
    -0.14
     Wet
    -0.14
    668
    -0.13
    ëıħ
    -0.13
    arde
    -0.13
    pler
    -0.13
    .construct
    -0.13
    929
    -0.13
    POSITIVE LOGITS
     sensitive
    0.25
    -sensitive
    0.23
     sensitivity
    0.19
    Sensitive
    0.19
     protection
    0.19
    ensitive
    0.19
    æķı
    0.17
     protect
    0.17
     Rey
    0.17
    ä¿ĿæĬ¤
    0.17
    Act Density 0.025%

    No Known Activations