INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    +'\
    -0.07
    roje
    -0.07
    .shop
    -0.07
    طبيق
    -0.06
     timid
    -0.06
     Timer
    -0.06
     timer
    -0.06
     kW
    -0.06
    rimon
    -0.06
     hedge
    -0.06
    POSITIVE LOGITS
     pasta
    0.08
     spaghetti
    0.08
    die
    0.08
     Background
    0.07
     noodles
    0.07
     nood
    0.07
    oodles
    0.07
     Pasta
    0.07
     costume
    0.06
    chner
    0.06
    Act Density 0.007%

    No Known Activations