INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ulcer
    -0.07
    .dropout
    -0.07
    backward
    -0.07
     Mori
    -0.06
     restarted
    -0.06
     Peterson
    -0.06
    attrib
    -0.06
     Ferm
    -0.06
     slowing
    -0.06
     напит
    -0.06
    POSITIVE LOGITS
     lol
    0.07
    NOP
    0.07
     hiç
    0.06
     pure
    0.06
     bigotry
    0.06
    手に
    0.06
    0.06
     Verse
    0.06
     отримання
    0.06
    ẳng
    0.06
    Act Density 0.058%

    No Known Activations