INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    _notify
    -0.06
     furniture
    -0.06
     gains
    -0.06
    kl
    -0.06
     imaginary
    -0.06
    endra
    -0.06
     systematically
    -0.06
     pain
    -0.06
     UC
    -0.05
     развити
    -0.05
    POSITIVE LOGITS
     Network
    0.09
     network
    0.08
     networks
    0.07
    Network
    0.07
     (::
    0.07
     Networks
    0.07
     wordpress
    0.07
     ANN
    0.07
     MLP
    0.07
     Ahmad
    0.07
    Act Density 0.005%

    No Known Activations