INDEX
    Explanations

    comparisons between different values or statistics

    phrases that involve comparisons

    New Auto-Interp
    Negative Logits
    ANCE
    -0.75
    hardt
    -0.67
     Inquiry
    -0.64
    hack
    -0.64
    iband
    -0.61
    */(
    -0.61
    Spons
    -0.61
    Der
    -0.60
    awar
    -0.60
    colo
    -0.60
    POSITIVE LOGITS
     ours
    0.85
     usual
    0.82
    whelming
    0.75
     those
    0.74
     previous
    0.72
    average
    0.70
     traditional
    0.68
     peers
    0.67
     typical
    0.65
     mere
    0.64
    Act Density 0.075%

    No Known Activations