INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    同等
    -0.07
     afect
    -0.07
     belly
    -0.07
    [date
    -0.07
    Board
    -0.07
     feared
    -0.07
    eneg
    -0.07
    勃勃
    -0.07
    -0.06
    ]['
    -0.06
    POSITIVE LOGITS
     двигател
    0.07
    红旗
    0.07
     commission
    0.07
    jas
    0.06
     gig
    0.06
     commande
    0.06
     програм
    0.06
     пользоват
    0.06
     Ru
    0.06
    sto
    0.06
    Act Density 0.002%

    No Known Activations