INDEX
    Explanations

    words that indicate failure or shortcomings

    New Auto-Interp
    Negative Logits
    ization
    -0.86
    Vidite
    -0.84
    AndEndTag
    -0.84
    Demografie
    -0.83
    θρώ
    -0.79
    OGND
    -0.78
     uVar
    -0.77
    NUMX
    -0.75
     Elbe
    -0.75
    Rüyada
    -0.75
    POSITIVE LOGITS
     fail
    2.15
     fails
    2.04
     failed
    2.02
     Fail
    1.94
     Failed
    1.88
    fail
    1.86
    failed
    1.80
    fails
    1.79
     Fails
    1.77
    Fail
    1.77
    Act Density 0.073%

    No Known Activations