INDEX
    Explanations

    increase or decrease

    New Auto-Interp
    Negative Logits
     kaarangay
    -1.06
     kasarigan
    -0.85
    oa̍t
    -0.81
     noqa
    -0.76
    parsedMessage
    -0.76
    ArrowToggle
    -0.75
     للمعارف
    -0.74
     дописавши
    -0.73
    CppCodeGen
    -0.73
     autorytatywna
    -0.72
    POSITIVE LOGITS
     higher
    1.24
    higher
    1.09
     lower
    0.95
     HIGHER
    0.91
    Higher
    0.91
     Higher
    0.87
    lower
    0.73
     höher
    0.72
     LOWER
    0.64
     upper
    0.63
    Act Density 0.001%

    No Known Activations