INDEX
    Explanations

    Writing and math elements

    New Auto-Interp
    Negative Logits
    stuff
    -0.08
     postpartum
    -0.07
     Vir
    -0.07
    Frank
    -0.07
    /report
    -0.07
     Patrick
    -0.07
    ogie
    -0.07
     gone
    -0.07
     condu
    -0.07
     Stiftung
    -0.07
    POSITIVE LOGITS
     trou
    0.08
     Kru
    0.08
     же
    0.08
    bw
    0.07
     потер
    0.07
     Joaquin
    0.07
     boc
    0.07
     XT
    0.07
    0.07
    .rich
    0.07
    Act Density 0.079%

    No Known Activations