INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Autoritní
    -1.04
     Roskov
    -0.89
     poveznice
    -0.85
     препратки
    -0.83
    )_/¯
    -0.79
    PerformLayout
    -0.78
    Jeografia
    -0.78
     Савезне
    -0.75
    AndEndTag
    -0.75
    انيف
    -0.73
    POSITIVE LOGITS
     he
    0.59
     that
    0.57
    He
    0.55
     a
    0.52
     an
    0.49
    That
    0.49
     another
    0.49
    She
    0.48
    that
    0.46
     That
    0.45
    Act Density 0.013%

    No Known Activations