INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    eer
    -0.18
    ezi
    -0.17
    uido
    -0.16
    uzzi
    -0.16
    ecta
    -0.16
    eil
    -0.15
    etak
    -0.15
    hm
    -0.15
    hem
    -0.15
    e
    -0.15
    POSITIVE LOGITS
    nap
    0.32
    ults
    0.24
    /ad
    0.23
    -friendly
    0.22
    neys
    0.21
    /people
    0.21
    friendly
    0.20
    stuff
    0.20
     aged
    0.20
     friendly
    0.20
    Act Density 0.022%

    No Known Activations