INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     DCHECK
    -0.08
    -0.08
    -0.08
     zeros
    -0.07
     annoyed
    -0.07
     заявил
    -0.07
    ��
    -0.07
    -0.07
    .delay
    -0.07
    -0.07
    POSITIVE LOGITS
     niezb
    0.08
    	typ
    0.07
     มกร
    0.07
    0.07
    	sf
    0.06
    .Device
    0.06
    gu
    0.06
    jur
    0.06
    -router
    0.06
     Core
    0.06
    Act Density 0.007%

    No Known Activations