INDEX
    Explanations

    expressions of personal preferences or dislikes

    expressions of dislike or negative sentiment

    New Auto-Interp
    Negative Logits
    yrinth
    -0.83
    ensional
    -0.83
    achine
    -0.81
    aunder
    -0.81
    rontal
    -0.80
    monary
    -0.79
    minster
    -0.78
    alde
    -0.77
    igmatic
    -0.77
    estones
    -0.76
    POSITIVE LOGITS
     anymore
    1.06
     anybody
    0.90
     anyone
    0.82
     anything
    0.81
     bullies
    0.79
     undue
    0.77
     any
    0.76
     nor
    0.76
     surprises
    0.74
    cens
    0.72
    Act Density 0.084%

    No Known Activations