INDEX
    Explanations

    Logging configuration

    New Auto-Interp
    Negative Logits
    زو
    -0.08
     erot
    -0.07
     homosexual
    -0.07
     المواد
    -0.07
     erotic
    -0.07
     finis
    -0.07
    .metro
    -0.07
    حص
    -0.07
    Feet
    -0.07
     alien
    -0.07
    POSITIVE LOGITS
    =logging
    0.09
    ={$
    0.09
    details
    0.08
    logging
    0.08
     STDERR
    0.08
    =l
    0.08
    =${
    0.08
     '${
    0.08
    =w
    0.08
     logging
    0.08
    Act Density 0.001%

    No Known Activations