INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     денег
    -0.08
    _toggle
    -0.08
     bipolar
    -0.07
    -0.07
    _school
    -0.07
     princes
    -0.07
    宝贝
    -0.07
     continent
    -0.07
    -seat
    -0.07
     liste
    -0.06
    POSITIVE LOGITS
     отно
    0.07
     interest
    0.07
    iale
    0.07
    Ί
    0.07
    Answer
    0.07
    MORE
    0.07
    0.07
    0.07
    0.07
     adds
    0.07
    Act Density 0.012%

    No Known Activations