INDEX
    Explanations

    sentences about personal opinions or experiences

    New Auto-Interp
    Negative Logits
    opez
    -0.76
    elve
    -0.74
    luaj
    -0.74
    dies
    -0.72
    asers
    -0.69
    styles
    -0.69
    aneers
    -0.67
    ãĤ©
    -0.66
    letes
    -0.65
    chev
    -0.65
    POSITIVE LOGITS
     happening
    0.97
     definitely
    0.90
     NOT
    0.85
     supposed
    0.84
     gonna
    0.84
     unacceptable
    0.81
    nt
    0.80
     truly
    0.79
     not
    0.79
     purely
    0.77
    Act Density 0.117%

    No Known Activations