INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     deny
    -1.02
     denial
    -0.96
     Admit
    -0.90
     denying
    -0.89
     denies
    -0.87
     Denial
    -0.86
    denial
    -0.85
    ]--;
    -0.84
    ніципа
    -0.84
     Deny
    -0.84
    POSITIVE LOGITS
     fe
    0.38
    HexString
    0.38
     per
    0.35
     sted
    0.34
     protested
    0.34
    कारी
    0.34
     kró
    0.32
    TagMode
    0.32
     tác
    0.32
     bé
    0.31
    Act Density 0.002%

    No Known Activations