INDEX
    Explanations

    instances or mentions of examples in various contexts

    New Auto-Interp
    Negative Logits
    lander
    -0.19
    ern
    -0.17
    ernes
    -0.17
    elper
    -0.17
    ernet
    -0.17
     exemplo
    -0.17
    speaker
    -0.16
    erness
    -0.16
    omo
    -0.15
    /Dk
    -0.15
    POSITIVE LOGITS
    d
    0.26
    e
    0.21
    ãģĪãģ°
    0.20
    sto
    0.19
     taken
    0.18
    /tutorial
    0.18
    /template
    0.18
    OfWork
    0.18
     cited
    0.17
     sake
    0.16
    Act Density 0.053%

    No Known Activations