INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     outlet
    -0.07
     again
    -0.07
     Silent
    -0.07
    /event
    -0.07
     backed
    -0.07
    Client
    -0.07
    Wait
    -0.07
     Front
    -0.06
     Steak
    -0.06
     Clint
    -0.06
    POSITIVE LOGITS
     morph
    0.18
     Morph
    0.16
    morph
    0.10
     Mor
    0.08
     ambassador
    0.08
    mpp
    0.08
     morphology
    0.08
     MPS
    0.07
    orph
    0.07
     رئیس
    0.07
    Act Density 0.007%

    No Known Activations