INDEX
    Explanations

    Function words

    New Auto-Interp
    Negative Logits
    Express
    -0.07
    contain
    -0.07
     Statements
    -0.07
    孕期
    -0.06
     pr
    -0.06
    -year
    -0.06
     undefeated
    -0.06
    -0.06
    Orders
    -0.06
    Show
    -0.06
    POSITIVE LOGITS
     Evans
    0.07
     battling
    0.07
     radar
    0.07
     polic
    0.07
     firing
    0.07
     onstage
    0.07
     bogus
    0.07
     SYS
    0.07
     chạy
    0.07
    0.07
    Act Density 0.166%

    No Known Activations