INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     возможности
    -0.08
     мл
    -0.07
    それ
    -0.07
    Radius
    -0.07
    eten
    -0.06
    _Number
    -0.06
     grown
    -0.06
    Все
    -0.06
    multi
    -0.06
    Console
    -0.06
    POSITIVE LOGITS
    -support
    0.07
     lign
    0.07
    σκε
    0.07
     Internet
    0.07
     Ngày
    0.07
     Bec
    0.06
    Soft
    0.06
    ظˆ
    0.06
     Lovely
    0.06
    \Application
    0.06
    Act Density 0.002%

    No Known Activations