INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .Delete
    -0.08
                                                                            
    -0.07
    COL
    -0.07
    opathic
    -0.07
     unequal
    -0.07
     Strap
    -0.06
    .VALUE
    -0.06
     resize
    -0.06
     Ore
    -0.06
    IMIZE
    -0.06
    POSITIVE LOGITS
     Antworten
    0.07
    -ap
    0.06
     produced
    0.06
    jas
    0.06
     releasing
    0.06
     humiliation
    0.06
     meisten
    0.06
    時の
    0.06
    Und
    0.06
    ٬
    0.06
    Act Density 0.015%

    No Known Activations