INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     expand
    -0.07
     +#+#+#+
    -0.06
     πολι
    -0.06
    /portfolio
    -0.06
    ported
    -0.06
    equal
    -0.06
     convenient
    -0.06
    -collection
    -0.06
     Zip
    -0.06
    РИ
    -0.06
    POSITIVE LOGITS
     disagreed
    0.10
     disagrees
    0.09
     disagreement
    0.09
     disagree
    0.08
     discard
    0.08
     ilç
    0.07
    SK
    0.07
    aro
    0.07
    ihu
    0.07
     agreeing
    0.07
    Act Density 0.006%

    No Known Activations