INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     barbar
    -0.07
    ARR
    -0.07
     alarm
    -0.07
    ari
    -0.07
     unin
    -0.06
     рабоч
    -0.06
     Р
    -0.06
    ินการ
    -0.06
    wl
    -0.06
     careless
    -0.06
    POSITIVE LOGITS
     duct
    0.17
    DUCT
    0.10
    duct
    0.09
    0.08
    uct
    0.07
    DC
    0.07
    UCT
    0.07
    _DC
    0.07
    .products
    0.07
    Educ
    0.07
    Act Density 0.002%

    No Known Activations