INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     period
    -0.07
     liquid
    -0.07
     aide
    -0.06
     Samuel
    -0.06
     Inst
    -0.06
     practitioner
    -0.06
    Blocks
    -0.06
     familiar
    -0.06
     Stuart
    -0.06
     instances
    -0.06
    POSITIVE LOGITS
     ك
    0.07
    xFE
    0.06
    ')+
    0.06
    0.06
    alers
    0.06
    0.06
     MPU
    0.06
    .'));↵
    0.06
    0.06
    '];↵↵
    0.06
    Act Density 0.001%

    No Known Activations