INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Origins
    0.66
    ₂+
    0.63
     auxilia
    0.63
    zsche
    0.63
     benefits
    0.63
     benefici
    0.63
    观测
    0.62
     beneficial
    0.61
     amén
    0.61
     aiding
    0.61
    POSITIVE LOGITS
     NEVER
    1.37
     PLEASE
    1.36
     DO
    1.33
     NO
    1.32
     MOST
    1.29
     BEFORE
    1.28
     STOP
    1.27
     ALWAYS
    1.24
     DON
    1.23
     WE
    1.22
    Act Density 0.372%

    No Known Activations