INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     انت
    -0.07
     Tip
    -0.07
    _reservation
    -0.07
    anager
    -0.06
    .Bit
    -0.06
    Cash
    -0.06
     lớn
    -0.06
     consequence
    -0.06
     defe
    -0.06
     coefficient
    -0.06
    POSITIVE LOGITS
     foods
    0.14
     Foods
    0.12
    foods
    0.07
     unittest
    0.06
     erot
    0.06
    0.06
     alimentos
    0.06
    0.06
     invade
    0.06
    -food
    0.06
    Act Density 0.008%

    No Known Activations