INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    (([
    -0.45
    jeg
    -0.41
    |||
    -0.40
     trustee
    -0.39
     daß
    -0.38
     stumped
    -0.37
    indeki
    -0.37
     Rudy
    -0.37
    leta
    -0.35
     Reihe
    -0.35
    POSITIVE LOGITS
     Fashion
    2.08
     fashion
    2.03
    Fashion
    1.98
    fashion
    1.97
     FASHION
    1.89
    FASHION
    1.71
     fashions
    1.38
    ashion
    1.29
    ashions
    1.25
     moda
    1.23
    Act Density 0.002%

    No Known Activations