INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     governed
    -0.08
     должны
    -0.07
     degrade
    -0.07
     privileges
    -0.06
     exchanges
    -0.06
    Ev
    -0.06
     credited
    -0.06
    EEEE
    -0.06
    _RESOURCES
    -0.06
     。↵
    -0.06
    POSITIVE LOGITS
     feel
    0.10
     Feel
    0.09
     Feeling
    0.08
     felt
    0.07
     feels
    0.07
    (lo
    0.07
     feeling
    0.07
     SWT
    0.07
     pyplot
    0.06
    уди
    0.06
    Act Density 0.028%

    No Known Activations