INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .yy
    -0.08
    Fuck
    -0.07
     bree
    -0.07
     straw
    -0.07
     sides
    -0.07
     aspect
    -0.07
    wk
    -0.07
     Woo
    -0.07
    -piece
    -0.06
     multipart
    -0.06
    POSITIVE LOGITS
     Quando
    0.07
     generators
    0.07
     potassium
    0.07
     generator
    0.07
    0.06
     quint
    0.06
     включ
    0.06
    igest
    0.06
    0.06
     система
    0.06
    Act Density 0.005%

    No Known Activations