INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     nasty
    -0.06
     aure
    -0.06
     organs
    -0.06
    _null
    -0.06
    Informe
    -0.06
    (cap
    -0.06
    UU
    -0.06
    attro
    -0.06
     obedient
    -0.06
    -0.06
    POSITIVE LOGITS
     misleading
    0.09
    über
    0.08
     guidelines
    0.07
    _checker
    0.07
     ember
    0.07
     Dataset
    0.07
    			      
    0.07
    leading
    0.07
    bestos
    0.07
    ;margin
    0.07
    Act Density 0.003%

    No Known Activations