INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ätze
    -0.08
    Jug
    -0.08
    Ela
    -0.08
    Pic
    -0.08
    Obj
    -0.08
    giv
    -0.08
    Ark
    -0.08
    iów
    -0.07
    Plat
    -0.07
    KR
    -0.07
    POSITIVE LOGITS
     Disability
    0.08
     rouge
    0.08
     bantu
    0.08
     основе
    0.08
     disability
    0.08
    kinson
    0.07
     least
    0.07
     Least
    0.07
     Foley
    0.07
    0.07
    Act Density 0.096%

    No Known Activations