INDEX
    Explanations

    phrases describing strong and clear contrasts

    references to stark contrasts or inequalities

    New Auto-Interp
    Negative Logits
    hops
    -0.83
    annis
    -0.71
    ipop
    -0.71
    uthor
    -0.70
     diligently
    -0.69
    andom
    -0.69
    aceae
    -0.68
    onz
    -0.68
    RAFT
    -0.67
    ilk
    -0.64
    POSITIVE LOGITS
     contrasts
    1.22
     contrast
    1.09
    ly
    1.07
     stark
    0.95
     departure
    0.91
     difference
    0.90
     differences
    0.86
     reminders
    0.83
     contradiction
    0.82
    ethy
    0.82
    Act Density 0.069%

    No Known Activations