INDEX
    Explanations

    model performance evaluation

    This neuron activates on words referring to class imbalance or unbalanced datasets.

    New Auto-Interp
    Negative Logits
    008
    -0.07
     reaction
    -0.06
     memories
    -0.06
     branches
    -0.06
     Gaussian
    -0.06
    ArrayType
    -0.06
    inally
    -0.06
     MART
    -0.06
     reactions
    -0.06
    ,args
    -0.06
    POSITIVE LOGITS
     Something
    0.07
    (Contact
    0.07
     volumes
    0.06
    verb
    0.06
    0.06
     rejects
    0.06
    VB
    0.06
    anneer
    0.06
     subroutine
    0.06
    niž
    0.06
    Act Density 0.006%

    No Known Activations