INDEX
    Explanations

    commands and instructions related to enabling or disabling features in software or systems

    New Auto-Interp
    Negative Logits
    -
    -0.63
    -0.63
    /
    -0.60
    ?
    -0.56
    ,
    -0.52
     (
    -0.52
    (
    -0.52
     N
    -0.51
    l
    -0.50
    <eos>
    -0.50
    POSITIVE LOGITS
     disable
    2.23
     Disable
    1.96
    disable
    1.94
    Disable
    1.70
     disabling
    1.62
     disables
    1.48
     deactivate
    1.23
     DISABLE
    1.22
    DISABLE
    1.20
     Enable
    1.15
    Act Density 0.035%

    No Known Activations