INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    _diff
    -0.07
     fuss
    -0.07
     flips
    -0.07
    ач
    -0.07
    -0.07
    hz
    -0.07
    -0.06
    传球
    -0.06
     sóc
    -0.06
     ang
    -0.06
    POSITIVE LOGITS
    .thumbnail
    0.09
    domain
    0.08
     swallowed
    0.08
    全域
    0.08
    &oacute
    0.07
     joyful
    0.07
    🚼
    0.07
    .memory
    0.07
    .document
    0.07
    0.07
    Act Density 0.003%

    No Known Activations