INDEX
    Explanations

    mentions of items that come in pairs

    references to "pair" or sets of items

    New Auto-Interp
    Negative Logits
    ulhu
    -0.98
    inez
    -0.71
     Causes
    -0.69
    Interstitial
    -0.69
    ESCO
    -0.64
    UGE
    -0.63
    avez
    -0.63
    amaz
    -0.63
    INA
    -0.63
    ADRA
    -0.62
    POSITIVE LOGITS
    pair
    0.96
    rings
    0.95
    ings
    0.95
    ably
    0.93
    wise
    0.89
     pair
    0.82
    lihood
    0.79
    pieces
    0.79
    ring
    0.77
     paired
    0.76
    Act Density 0.018%

    No Known Activations