INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Rail
    -0.08
    (Control
    -0.07
     vajad
    -0.07
     znak
    -0.07
     wymag
    -0.07
    REQ
    -0.07
    _exist
    -0.07
     Pric
    -0.07
    ambient
    -0.07
    Req
    -0.07
    POSITIVE LOGITS
    突出
    0.10
     полно
    0.09
     kroner
    0.09
     gummies
    0.08
     synergy
    0.08
    ibble
    0.08
     sikre
    0.08
    ipps
    0.08
    uphoria
    0.08
     performance
    0.08
    Act Density 0.013%

    No Known Activations