INDEX
    Explanations

    comparisons between different entities

    phrases that compare two opposing concepts or entities, often in the format of "X vs. Y."

    New Auto-Interp
    Negative Logits
    shire
    -0.77
    spot
    -0.75
     liner
    -0.72
    oola
    -0.69
    bean
    -0.65
    Topics
    -0.64
    estern
    -0.63
     Lauder
    -0.61
    tarian
    -0.60
    lied
    -0.60
    POSITIVE LOGITS
    creen
    0.76
    illa
    0.75
    illas
    0.74
    pecting
    0.74
    ampa
    0.71
    .,
    0.65
     RHP
    0.63
    pect
    0.62
     seq
    0.61
    iors
    0.60
    Act Density 0.017%

    No Known Activations