INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     propio
    -0.08
     protr
    -0.08
     cenário
    -0.07
    Combination
    -0.07
    801
    -0.07
    出租
    -0.07
     المض
    -0.07
     dish
    -0.07
     publicidade
    -0.07
     génére
    -0.07
    POSITIVE LOGITS
     пет
    0.09
    -knit
    0.09
     ομά
    0.08
     קבוצ
    0.08
    Interested
    0.08
    _learning
    0.08
    兴趣
    0.08
    ukturen
    0.08
     epistem
    0.08
    uhu
    0.08
    Act Density 0.011%

    No Known Activations