INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Sax
    -0.07
     Numer
    -0.07
    cosa
    -0.06
    .Group
    -0.06
     Science
    -0.06
     öldür
    -0.06
     dirt
    -0.06
    _DIALOG
    -0.06
    -exp
    -0.06
     هنگام
    -0.06
    POSITIVE LOGITS
     memory
    0.10
     Memory
    0.09
    Preview
    0.07
     mailbox
    0.06
     locator
    0.06
     وف
    0.06
     did
    0.06
     "]"
    0.06
    Memory
    0.06
     --------------------
    0.06
    Act Density 0.011%

    No Known Activations