INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Novel
    -0.07
    	right
    -0.06
     llam
    -0.06
    олее
    -0.06
    margin
    -0.06
     plates
    -0.06
     характер
    -0.06
    .Write
    -0.06
    -0.06
    州市
    -0.06
    POSITIVE LOGITS
     nerd
    0.07
     MOR
    0.07
     Din
    0.06
     tune
    0.06
     JPG
    0.06
    Rnd
    0.06
    Easy
    0.06
    uted
    0.06
     ostr
    0.05
    uto
    0.05
    Act Density 0.014%

    No Known Activations