INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     app
    -0.07
    YC
    -0.07
    出去
    -0.07
    PROTO
    -0.07
     mutated
    -0.06
    сте
    -0.06
     hwnd
    -0.06
    tığını
    -0.06
     margin
    -0.06
     managing
    -0.06
    POSITIVE LOGITS
    0.07
    0.06
    0.06
     Пер
    0.06
    0.06
    포츠
    0.06
    irmingham
    0.06
    acock
    0.05
     beş
    0.05
     DOWN
    0.05
    Act Density 0.014%

    No Known Activations