INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     pParent
    -0.08
    -0.07
    ۷
    -0.07
     dém
    -0.07
    _DELTA
    -0.06
    (vals
    -0.06
    chter
    -0.06
     Bakery
    -0.06
    sudo
    -0.06
     statt
    -0.06
    POSITIVE LOGITS
    ,g
    0.07
     worldview
    0.06
     então
    0.06
    .transitions
    0.06
     altogether
    0.06
     вигля
    0.06
    -to
    0.06
     вполне
    0.06
     especially
    0.06
     неб
    0.06
    Act Density 0.014%

    No Known Activations