INDEX
    Explanations

    instances of comparison and contrasting between different subjects or concepts

    New Auto-Interp
    Negative Logits
    ough
    -0.17
    yen
    -0.16
    оÑĥ
    -0.16
    ated
    -0.15
    elry
    -0.15
    ereotype
    -0.15
    etur
    -0.15
    uth
    -0.15
    ration
    -0.14
    oper
    -0.14
    POSITIVE LOGITS
    favor
    0.17
    isons
    0.17
     favor
    0.17
    ãģ¹
    0.17
    atively
    0.17
     against
    0.16
     apples
    0.16
     unfavor
    0.16
    contrast
    0.16
    Against
    0.16
    Act Density 0.030%

    No Known Activations