INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     afternoon
    -0.08
    ji
    -0.08
    consistent
    -0.07
    _task
    -0.07
    -0.07
    .blog
    -0.07
     Ji
    -0.07
    reach
    -0.07
     associated
    -0.07
     leveraging
    -0.07
    POSITIVE LOGITS
     klein
    0.08
     najwięks
    0.07
     chants
    0.07
    0.07
     médec
    0.07
    涂料
    0.07
     flam
    0.07
    Ǎ
    0.07
    بالغ
    0.06
     animator
    0.06
    Act Density 0.005%

    No Known Activations