INDEX
    Explanations

    phrases related to comparisons or opinions

    phrases that introduce examples or justifications

    New Auto-Interp
    Negative Logits
    ptives
    -0.71
    "],
    -0.71
    ritz
    -0.67
    taboola
    -0.66
    aley
    -0.66
     Cosponsors
    -0.63
    ounter
    -0.62
     Kirin
    -0.61
    atform
    -0.61
    ombat
    -0.61
    POSITIVE LOGITS
    ĪĴ
    0.97
     paraph
    0.61
    acea
    0.58
    opsis
    0.58
     rightly
    0.57
    mitt
    0.57
     parentheses
    0.57
     pse
    0.57
    -)
    0.56
     bearer
    0.55
    Act Density 0.324%

    No Known Activations