INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ."""↵↵
    -0.07
     restored
    -0.07
    Hold
    -0.07
    .Open
    -0.07
     adviser
    -0.06
     humorous
    -0.06
     praises
    -0.06
    PlainText
    -0.06
    removeClass
    -0.06
    Open
    -0.06
    POSITIVE LOGITS
     outcomes
    0.07
    _workflow
    0.07
    0.07
     vay
    0.06
    pur
    0.06
     hur
    0.06
     CIM
    0.06
    umar
    0.06
     Outlet
    0.06
    0.06
    Act Density 0.008%

    No Known Activations