INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    атр
    -0.08
     wellness
    -0.08
     Wellness
    -0.08
     Plain
    -0.07
     Bold
    -0.07
    报警
    -0.07
     SWOT
    -0.07
     Wert
    -0.07
    MAG
    -0.07
     bold
    -0.07
    POSITIVE LOGITS
     reduzir
    0.13
     reduces
    0.12
     reducir
    0.12
     drastically
    0.12
     reduce
    0.12
     reduziert
    0.12
    减少
    0.11
     reduz
    0.11
     reducing
    0.11
     reduzieren
    0.10
    Act Density 0.034%

    No Known Activations