INDEX
    Explanations

    positive adjectives and superlatives

    expressions of personal preference or favorite things

    New Auto-Interp
    Negative Logits
    oros
    -0.60
    hai
    -0.58
    icans
    -0.56
    etta
    -0.55
    elf
    -0.55
     revise
    -0.55
    ulent
    -0.54
    oris
    -0.54
    ãĤ¼
    -0.53
     resumes
    -0.53
    POSITIVE LOGITS
     ones
    0.91
     standout
    0.80
     favorites
    0.80
     none
    0.77
    liest
    0.76
     singled
    0.74
     hardest
    0.74
     favourites
    0.74
    none
    0.72
    Probably
    0.71
    Act Density 0.554%

    No Known Activations