INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
    (S
    -0.07
    tlement
    -0.06
     FormGroup
    -0.06
     Maven
    -0.06
    цієн
    -0.06
     incel
    -0.06
    	console
    -0.06
    Acts
    -0.06
    <Message
    -0.06
    POSITIVE LOGITS
    ier
    0.09
    IER
    0.08
    ISING
    0.07
     fier
    0.07
     mater
    0.07
    fgets
    0.06
     additive
    0.06
    ılır
    0.06
    DER
    0.06
    rier
    0.06
    Act Density 0.012%

    No Known Activations