INDEX
    Explanations

    phrases indicating differences or contrasts between various entities

    phrases comparing differences between entities or concepts

    New Auto-Interp
    Negative Logits
    tti
    -0.90
    merga
    -0.78
    isphere
    -0.75
    taboola
    -0.69
    Limited
    -0.69
    aley
    -0.68
    ranked
    -0.66
    iband
    -0.64
    Dur
    -0.63
    bard
    -0.63
    POSITIVE LOGITS
     ours
    1.08
     ordinary
    1.01
     theirs
    0.93
     others
    0.92
     anything
    0.91
     other
    0.90
     yours
    0.89
     what
    0.86
     typical
    0.83
     previous
    0.82
    Act Density 0.085%

    No Known Activations