INDEX
    Explanations

    phrases indicating evaluation, such as "bad" or "difficult"

    phrases expressing difficulty or negative evaluations

    New Auto-Interp
    Negative Logits
    ãĤ½
    -0.75
    urd
    -0.71
    Offline
    -0.68
    gypt
    -0.67
    yth
    -0.67
    ystem
    -0.65
    last
    -0.64
    tv
    -0.64
    iHUD
    -0.63
    wake
    -0.63
    POSITIVE LOGITS
    agher
    0.73
    nels
    0.66
     Carbuncle
    0.66
     elbows
    0.65
     harass
    0.64
     slicing
    0.63
     cruising
    0.62
     temptation
    0.62
     flavors
    0.59
     elbow
    0.58
    Act Density 0.127%

    No Known Activations