INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     stripe
    -0.07
     Tasmania
    -0.07
    -0.06
     wissen
    -0.06
    eken
    -0.06
     necesita
    -0.06
     Youtube
    -0.06
     Fallon
    -0.06
     Trafford
    -0.06
     edm
    -0.06
    POSITIVE LOGITS
    0.07
    _written
    0.06
    Пер
    0.06
    сте
    0.06
    kw
    0.06
    cth
    0.06
    Composition
    0.06
    -size
    0.06
    loss
    0.06
    0.06
    Act Density 0.051%

    No Known Activations