INDEX
    Explanations

    words indicating thresholds, limits, or comparison

    New Auto-Interp
    Negative Logits
     Wikimedijinoj
    -0.66
    הערות
    -0.60
    -0.60
    كويكب
    -0.60
    httphttps
    -0.59
    ніципа
    -0.59
     Signalez
    -0.59
     Paglinawan
    -0.58
     fap
    -0.58
     dreamstime
    -0.58
    POSITIVE LOGITS
     full
    0.73
     ***!
    0.54
     optimal
    0.52
    balleur
    0.50
     FULL
    0.49
     InputDecoration
    0.49
     Full
    0.47
    full
    0.47
    fromnode
    0.47
    ној
    0.47
    Act Density 2.856%

    No Known Activations