INDEX
    Explanations

    terms related to stopping or halting actions

    New Auto-Interp
    Negative Logits
    /loose
    -0.15
     Dün
    -0.13
     Evet
    -0.13
    ledik
    -0.13
     inv
    -0.13
    ervers
    -0.13
    ullan
    -0.12
    ague
    -0.12
    /MPL
    -0.12
    esome
    -0.12
    POSITIVE LOGITS
     stop
    0.87
     stops
    0.77
     stopped
    0.77
     STOP
    0.77
     Stop
    0.77
    -stop
    0.76
     stopping
    0.74
    stop
    0.73
     halt
    0.71
    Stop
    0.71
    Act Density 0.218%

    No Known Activations