INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
    -0.06
     biases
    -0.06
    .fetchone
    -0.06
     TOKEN
    -0.06
     bureaucracy
    -0.06
    Delayed
    -0.06
    .matches
    -0.06
     Kurdistan
    -0.06
     hizmeti
    -0.06
    POSITIVE LOGITS
    iaz
    0.07
    .Sys
    0.07
     reads
    0.06
    aled
    0.06
    payment
    0.06
     cannons
    0.06
     singers
    0.06
     menu
    0.06
     superstar
    0.06
    mut
    0.06
    Act Density 0.013%

    No Known Activations