INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    hou
    -0.09
     andere
    -0.06
    �n
    -0.06
     Zo
    -0.06
    ังไม
    -0.06
    LOG
    -0.06
    ınıza
    -0.06
     nouns
    -0.06
     gcd
    -0.06
     Porno
    -0.06
    POSITIVE LOGITS
    remark
    0.07
    ,为
    0.07
    .Work
    0.06
    Sent
    0.06
     sen
    0.06
    より
    0.06
     прор
    0.06
    sim
    0.06
    resident
    0.06
    ,但
    0.06
    Act Density 0.035%

    No Known Activations