INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     pleasing
    -0.08
     contendo
    -0.08
    062
    -0.08
     dura
    -0.08
     riêng
    -0.08
     verbetering
    -0.08
    บิน
    -0.08
    787
    -0.08
     достой
    -0.08
     dân
    -0.08
    POSITIVE LOGITS
     NUE
    0.07
    tay
    0.07
    Gud
    0.07
    YU
    0.07
    .transparent
    0.07
    ARCH
    0.07
    0.07
     forces
    0.07
    Jac
    0.07
     aus
    0.07
    Act Density 0.001%

    No Known Activations