INDEX
    Explanations

    expressions of surprise or shock

    New Auto-Interp
    Negative Logits
    rex
    -0.14
    xsd
    -0.13
    &&!
    -0.13
     подв
    -0.13
    _IOC
    -0.13
     Radius
    -0.13
     press
    -0.13
    ropy
    -0.13
    .tap
    -0.13
    Ñıд
    -0.13
    POSITIVE LOGITS
    ordova
    0.17
    миÑĢ
    0.16
     Kidd
    0.15
     Teh
    0.15
    ingly
    0.15
     WTF
    0.15
    naz
    0.14
     Sala
    0.14
    prises
    0.14
     surprise
    0.14
    Act Density 0.237%

    No Known Activations