INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     alloy
    -0.07
     OLD
    -0.07
    OLF
    -0.07
     keeping
    -0.07
     KEEP
    -0.07
    AGIC
    -0.07
    พย
    -0.07
    OLT
    -0.06
    olf
    -0.06
    PIO
    -0.06
    POSITIVE LOGITS
     hum
    0.10
     humming
    0.10
     Hum
    0.09
     buzz
    0.09
    Hum
    0.08
     Buzz
    0.08
    duced
    0.07
    um
    0.07
    ub
    0.07
     fem
    0.07
    Act Density 0.006%

    No Known Activations