INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
    -0.07
     touched
    -0.07
     Based
    -0.06
    剩下
    -0.06
     neighbour
    -0.06
    -0.06
    gist
    -0.06
    -0.06
     misdemean
    -0.06
    POSITIVE LOGITS
    cono
    0.07
     dàng
    0.07
     Automation
    0.07
    0.07
     difficile
    0.07
     automation
    0.07
    _RAM
    0.07
    おり
    0.07
     vực
    0.07
     יכולה
    0.07
    Act Density 0.001%

    No Known Activations