INDEX
    Explanations

    instances of differentiation and distinctions among concepts or categories

    New Auto-Interp
    Negative Logits
     earnest
    -0.68
     excess
    -0.65
     understatement
    -0.62
    ccoli
    -0.58
    upid
    -0.56
     underrated
    -0.56
     omission
    -0.55
    fin
    -0.55
    unc
    -0.54
     absence
    -0.54
    POSITIVE LOGITS
     altogether
    1.02
     than
    0.95
     depending
    0.95
    than
    0.92
    iates
    0.88
    iating
    0.87
    iations
    0.85
     different
    0.83
    Different
    0.81
    styles
    0.78
    Act Density 0.749%

    No Known Activations