INDEX
    Explanations

    phrases related to comparisons or lists of items

    phrases that indicate comparison or similarity

    New Auto-Interp
    Negative Logits
    ople
    -0.83
    itiveness
    -0.70
    /+
    -0.67
    trap
    -0.63
    rous
    -0.62
    ======
    -0.61
    atism
    -0.60
    \<
    -0.59
    aca
    -0.59
    orage
    -0.58
    POSITIVE LOGITS
     well
    1.75
    well
    1.39
     opposed
    1.13
    pects
    0.99
    ynchron
    0.95
    part
    0.95
    ociated
    0.90
    ides
    0.88
     diverse
    0.88
     Well
    0.87
    Act Density 0.121%

    No Known Activations