INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     infants
    -0.07
    (ob
    -0.07
     condoms
    -0.07
    beta
    -0.07
     spontaneously
    -0.06
     kvinde
    -0.06
    difficulty
    -0.06
    kup
    -0.06
     sunset
    -0.06
     Rainbow
    -0.06
    POSITIVE LOGITS
    Press
    0.14
     Press
    0.14
     press
    0.13
    PRESS
    0.11
     presses
    0.09
    press
    0.09
     PRESS
    0.09
     pressing
    0.09
    _press
    0.08
    .press
    0.08
    Act Density 0.015%

    No Known Activations