INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Dynam
    -0.07
     Lance
    -0.07
    くと
    -0.07
     friction
    -0.07
    Summary
    -0.07
    riculum
    -0.06
     Rud
    -0.06
    guarded
    -0.06
     guy
    -0.06
     essay
    -0.06
    POSITIVE LOGITS
    MPI
    0.07
     ''
    ↵
    0.06
    ",'
    0.06
     olacağ
    0.06
     hợp
    0.06
     Rom
    0.06
    annie
    0.06
    _io
    0.06
    やる夫
    0.06
     hoje
    0.06
    Act Density 0.007%

    No Known Activations