INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ----------
    -0.07
    erreur
    -0.07
    SECOND
    -0.06
    ,String
    -0.06
     ))
    -0.06
    Clean
    -0.06
     webpage
    -0.06
    ивает
    -0.06
     próximo
    -0.06
    ,number
    -0.06
    POSITIVE LOGITS
    Compose
    0.07
     Fred
    0.07
     záz
    0.06
     glow
    0.06
    0.06
    lj
    0.06
    bw
    0.06
    edm
    0.06
     troublesome
    0.06
     Yine
    0.06
    Act Density 0.051%

    No Known Activations