INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     subset
    -0.07
     butterfly
    -0.07
     development
    -0.06
    growth
    -0.06
    _relative
    -0.06
     collapsed
    -0.06
    _wait
    -0.06
    diamond
    -0.06
    emporary
    -0.06
     drama
    -0.06
    POSITIVE LOGITS
    动生成
    0.06
     آسیاب
    0.06
     dvoj
    0.06
     Intern
    0.06
    0.06
    .fil
    0.06
     Шев
    0.06
     обуч
    0.06
     Öğ
    0.06
    0.06
    Act Density 0.122%

    No Known Activations