INDEX
    Explanations

    phrases indicating comparisons or contrasts in various contexts

    New Auto-Interp
    Negative Logits
    الدراسه
    -0.38
    RegressionTest
    -0.36
     متعلقه
    -0.35
    pidos
    -0.34
     dalších
    -0.33
    isNameExpr
    -0.33
    DJANGO
    -0.32
    ilov
    -0.32
    ToScroll
    -0.32
    uctura
    -0.31
    POSITIVE LOGITS
     different
    3.53
    different
    3.05
    Different
    2.88
     Different
    2.81
     diferente
    2.64
     diferentes
    2.61
     différent
    2.52
    不同的
    2.47
     différente
    2.42
    不同
    2.36
    Act Density 1.851%

    No Known Activations