INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     elderly
    -0.06
    ?>
    ↵
    ↵
    -0.06
    вед
    -0.06
     cosmetics
    -0.06
     Lobby
    -0.06
    598
    -0.06
    -0.06
    .bd
    -0.06
    солют
    -0.06
    	include
    -0.06
    POSITIVE LOGITS
     untouched
    0.07
     họa
    0.07
     todo
    0.07
    (Il
    0.06
    оді
    0.06
     unsuccessfully
    0.06
     Boards
    0.06
    isy
    0.06
    cor
    0.06
     окрем
    0.06
    Act Density 0.002%

    No Known Activations