INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     wastes
    -0.07
     catastrophic
    -0.07
    Stra
    -0.07
    DDR
    -0.07
     blinded
    -0.07
     DK
    -0.07
     Gaussian
    -0.06
     gaussian
    -0.06
     Tale
    -0.06
    ogeneity
    -0.06
    POSITIVE LOGITS
    earer
    0.07
     contractors
    0.07
     Πολι
    0.06
    "'
    0.06
    ForObject
    0.06
    .team
    0.06
     alimentos
    0.06
    (gui
    0.06
     busiest
    0.06
     plumber
    0.06
    Act Density 0.002%

    No Known Activations