INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     manus
    -0.08
    ieron
    -0.07
     Dort
    -0.07
    ono
    -0.07
     disag
    -0.07
    缺失
    -0.07
     February
    -0.07
    ()↵↵↵
    -0.07
    March
    -0.07
     nota
    -0.07
    POSITIVE LOGITS
    TIMER
    0.07
    🏀
    0.07
    CTYPE
    0.07
    NewItem
    0.07
    policy
    0.07
    𝐁
    0.07
    0.07
    .Designer
    0.07
    0.07
    .animations
    0.07
    Act Density 0.102%

    No Known Activations