INDEX
    Explanations

    phrases related to debates, explanations, or arguments

    New Auto-Interp
    Negative Logits
    ãĥ¬
    -0.76
    breaking
    -0.74
    hen
    -0.71
    shit
    -0.70
    oses
    -0.69
    Desk
    -0.69
    ante
    -0.66
    enge
    -0.66
    TY
    -0.66
    hens
    -0.66
    POSITIVE LOGITS
    cher
    0.81
     they
    0.76
     accompanies
    0.72
     someday
    0.72
     justifies
    0.69
     there
    0.69
     although
    0.69
     mismatch
    0.68
     arose
    0.67
     characterize
    0.66
    Act Density 1.253%

    No Known Activations