INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ieved
    -0.08
     locomotive
    -0.08
     وأضاف
    -0.08
     pok
    -0.08
     SOM
    -0.07
     driveway
    -0.07
     pobl
    -0.07
     resc
    -0.07
     cultivated
    -0.07
     coaster
    -0.07
    POSITIVE LOGITS
     lept
    0.08
    fighters
    0.08
    τι
    0.08
    workers
    0.08
    _blue
    0.08
    0.08
     सामान
    0.07
     lots
    0.07
    0.07
     potent
    0.07
    Act Density 0.008%

    No Known Activations