INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    copy
    -0.07
    _lastname
    -0.06
    thora
    -0.06
     dalle
    -0.06
    altern
    -0.06
    Liver
    -0.06
    уру
    -0.06
     SOM
    -0.06
    _dw
    -0.06
    -0.06
    POSITIVE LOGITS
     exciting
    0.12
     excited
    0.12
     excitement
    0.08
     explodes
    0.07
     quickly
    0.07
     esc
    0.07
    XR
    0.07
     dominate
    0.07
    GC
    0.07
    -seeking
    0.06
    Act Density 0.012%

    No Known Activations