INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     absoluto
    -0.08
     necesariamente
    -0.08
     دف
    -0.08
     rued
    -0.08
     النفس
    -0.08
    -0.08
     رضي
    -0.08
    -0.08
     durchaus
    -0.08
     forcément
    -0.08
    POSITIVE LOGITS
     decreasing
    0.10
     linear
    0.09
    .interpolate
    0.09
     decreases
    0.08
    closing
    0.08
    linear
    0.08
    arly
    0.08
    0.08
    _curve
    0.08
     Lyrics
    0.08
    Act Density 0.028%

    No Known Activations