INDEX
    Explanations

    phrases indicating judgment or value attribution

    instances of the word "that."

    New Auto-Interp
    Negative Logits
     Pets
    -0.78
     Cheong
    -0.74
    hips
    -0.72
     Directions
    -0.68
    ãĥ¥
    -0.68
     Gallery
    -0.65
     Cong
    -0.64
     Cards
    -0.64
     Planning
    -0.63
    raq
    -0.62
    POSITIVE LOGITS
     satisfies
    0.82
     violates
    0.82
     consumes
    0.81
     preceded
    0.79
     resembles
    0.74
    ĨĴ
    0.74
     produces
    0.73
    ¥µ
    0.72
    cedes
    0.72
     involves
    0.71
    Act Density 0.240%

    No Known Activations