INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     rpt
    -0.07
    deps
    -0.07
     Exterior
    -0.07
     },
    ↵
    ↵
    -0.07
    Ctx
    -0.07
     Crushing
    -0.06
    <Task
    -0.06
    🎯
    -0.06
     wk
    -0.06
    -confirm
    -0.06
    POSITIVE LOGITS
     hobby
    0.08
     Colon
    0.07
    _LAYER
    0.07
    0.07
    /oauth
    0.06
    oa
    0.06
    被认为
    0.06
    Inserted
    0.06
    ípio
    0.06
    聊天
    0.06
    Act Density 0.004%

    No Known Activations