INDEX
    Explanations

    words related to concerns or worries

    New Auto-Interp
    Negative Logits
    nice
    -0.74
    ingers
    -0.74
    lite
    -0.70
    ples
    -0.61
    ophone
    -0.61
     handy
    -0.60
    iller
    -0.58
    unes
    -0.58
    dating
    -0.58
     slick
    -0.57
    POSITIVE LOGITS
    warts
    1.06
    wart
    1.04
    lessly
    0.91
     about
    0.85
     trolling
    0.84
    ingly
    0.82
     bells
    0.81
    ieties
    0.80
     ABOUT
    0.77
     regarding
    0.76
    Act Density 0.532%

    No Known Activations