INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Computer
    -0.08
    Computer
    -0.08
     zal
    -0.08
    computer
    -0.08
     whiten
    -0.08
     computer
    -0.08
    Exclude
    -0.07
     geldt
    -0.07
    Artifact
    -0.07
     ilyen
    -0.07
    POSITIVE LOGITS
    ද්
    0.08
    werden
    0.08
    .mainloop
    0.08
     nuanced
    0.08
     Gradu
    0.08
     دقیق
    0.07
    দের
    0.07
    AREN
    0.07
     Vert
    0.07
     IDM
    0.07
    Act Density 0.017%

    No Known Activations