INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Chess
    -0.07
     Nichols
    -0.06
     IX
    -0.06
     unless
    -0.06
     Ally
    -0.06
    \b
    -0.06
     ogs
    -0.06
     Sticky
    -0.05
     MODEL
    -0.05
     contribution
    -0.05
    POSITIVE LOGITS
     medios
    0.07
    0.07
     exemple
    0.06
     dinheiro
    0.06
    voice
    0.06
     honeymoon
    0.06
    0.06
     salida
    0.06
    0.06
     articulated
    0.06
    Act Density 0.005%

    No Known Activations