INDEX
    Explanations

    references to political entities and affiliations

    nationalities and foreign terms

    New Auto-Interp
    Negative Logits
     autorytatywna
    -0.60
     surla
    -0.53
    TagMode
    -0.53
     ujednoznacz
    -0.52
     الرياضيه
    -0.51
    RegressionTest
    -0.50
     itſelf
    -0.50
    Wies
    -0.47
    iffance
    -0.47
     itself
    -0.47
    POSITIVE LOGITS
    .*")]
    0.38
    tagHelperRunner
    0.38
     isComment
    0.37
    0.36
    arrings
    0.35
    iotensin
    0.35
    cientos
    0.34
    mtliche
    0.32
    🏻
    0.32
     Peruvian
    0.31
    Act Density 0.030%

    No Known Activations