INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     stop
    -2.84
    stop
    -2.42
     Stop
    -2.36
    Stop
    -2.25
     STOP
    -1.99
     stops
    -1.85
    STOP
    -1.80
     Stops
    -1.71
     stopped
    -1.63
    stops
    -1.59
    POSITIVE LOGITS
     iſt
    0.77
     elevating
    0.71
    ſelf
    0.70
     pleaſure
    0.70
     Efq
    0.68
    withIdentifier
    0.68
     gaining
    0.68
     ſche
    0.67
     myſelf
    0.67
     adapting
    0.67
    Act Density 0.213%

    No Known Activations