INDEX
    Explanations

    phrases containing words that express positivity or admiration

    phrases that express preference for something being the best or better option

    New Auto-Interp
    Negative Logits
    naires
    -0.74
    stairs
    -0.70
    adra
    -0.69
    sembly
    -0.64
    ortment
    -0.64
    giene
    -0.64
    area
    -0.63
    hiro
    -0.60
    wash
    -0.60
    aine
    -0.59
    POSITIVE LOGITS
     than
    0.93
     testament
    0.81
     encaps
    0.80
     Than
    0.76
     succinct
    0.74
     nor
    0.72
     deserving
    0.70
     juxtap
    0.69
     exempl
    0.68
     illustration
    0.67
    Act Density 0.114%

    No Known Activations