INDEX
    Explanations

    variations or distinctions in behavior or treatment among subjects

    contrasting how things are done

    New Auto-Interp
    Negative Logits
     =
    -0.48
    -0.42
    an
    -0.41
    *
    -0.38
    chließ
    -0.37
    com
    -0.37
    =
    -0.37
     of
    -0.36
    0
    -0.36
    िल्म
    -0.36
    POSITIVE LOGITS
     differently
    1.80
     anders
    1.01
     differentially
    0.98
     Differ
    0.94
     diffé
    0.93
     inaczej
    0.91
    differ
    0.91
    Differ
    0.79
     autrement
    0.78
     DIFFERENT
    0.73
    Act Density 0.006%

    No Known Activations