INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     chosen
    -0.07
    -0.07
     fashion
    -0.07
    ety
    -0.07
     sudden
    -0.06
     men
    -0.06
     fried
    -0.06
    _address
    -0.06
     стали
    -0.06
     novelty
    -0.06
    POSITIVE LOGITS
     relu
    0.07
    аніт
    0.07
     cues
    0.06
    _decl
    0.06
     '%$
    0.06
    .herokuapp
    0.06
     pohy
    0.06
    .addColumn
    0.06
    scrollView
    0.06
    &w
    0.06
    Act Density 0.013%

    No Known Activations