INDEX
    Explanations

    phrases expressing positive or negative evaluations and judgments

    New Auto-Interp
    Negative Logits
     \'
    -0.76
    onut
    -0.66
    ench
    -0.65
    ohyd
    -0.64
    kee
    -0.62
    acca
    -0.62
    ilian
    -0.61
    ivalry
    -0.59
    haw
    -0.58
    cies
    -0.58
    POSITIVE LOGITS
     someday
    0.83
     tomorrow
    0.81
     if
    0.78
     Osw
    0.69
     Wouldn
    0.68
     sooner
    0.67
     wiser
    0.67
    morrow
    0.66
     feas
    0.65
     forever
    0.65
    Act Density 0.191%

    No Known Activations