INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Sanctions
    -0.60
     sanctions
    -0.57
     certifications
    -0.56
    UpInside
    -0.54
     Certifications
    -0.54
     aDecoder
    -0.52
    AndEndTag
    -0.51
     oxidase
    -0.50
    certification
    -0.50
     kated
    -0.49
    POSITIVE LOGITS
    <bos>
    0.64
     ProtoMessage
    0.58
    Контак
    0.57
     Numerade
    0.56
    ably
    0.56
    ViewFeatures
    0.54
    gnąć
    0.54
    enschappelijke
    0.53
    roën
    0.53
    дца
    0.52
    Act Density 0.039%

    No Known Activations