INDEX
    Explanations

    comparisons where one option is explicitly contrasted with another

    phrases that present contrasting ideas or alternatives

    New Auto-Interp
    Negative Logits
    enegger
    -0.64
     FANT
    -0.64
     Dahl
    -0.61
    boys
    -0.60
    redo
    -0.59
     Kard
    -0.59
     chairs
    -0.57
     Semin
    -0.57
    fly
    -0.56
    mberg
    -0.55
    POSITIVE LOGITS
    itably
    0.79
     necessarily
    0.73
     thereto
    0.73
    icip
    0.72
    arily
    0.68
    ifice
    0.68
    agonist
    0.66
    thodox
    0.66
    ively
    0.66
    oldown
    0.63
    Act Density 0.025%

    No Known Activations