INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     SUMMARY
    -0.07
     попу
    -0.07
    นโย
    -0.07
     obscene
    -0.07
    siblings
    -0.07
     وكان
    -0.07
    仍将
    -0.07
    -wall
    -0.06
     ho
    -0.06
    _JOB
    -0.06
    POSITIVE LOGITS
    _center
    0.06
    0.06
     beer
    0.06
     ----------------------------------------------------------------
    0.06
     Infant
    0.06
    itecture
    0.06
     Finland
    0.06
    idental
    0.06
     bg
    0.06
     Monterey
    0.06
    Act Density 0.012%

    No Known Activations