INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    aits
    -0.07
     Electro
    -0.07
    ducible
    -0.06
    _Equals
    -0.06
     Godzilla
    -0.06
     judged
    -0.06
     Kos
    -0.06
     Alexandria
    -0.06
    Mb
    -0.06
    fail
    -0.06
    POSITIVE LOGITS
    ")))
    0.07
    .btn
    0.07
    PR
    0.07
    0.07
    rah
    0.07
     clases
    0.07
    0.07
     grátis
    0.07
    86
    0.07
    .Batch
    0.07
    Act Density 0.035%

    No Known Activations