INDEX
    Explanations

    expressions of empathy, support, and gratitude towards others

    expressions of gratitude and acknowledgment towards various groups of people

    New Auto-Interp
    Negative Logits
    illac
    -0.66
    potion
    -0.62
    ingo
    -0.60
     exception
    -0.60
    feature
    -0.58
    oute
    -0.58
    tesque
    -0.57
    culosis
    -0.57
    iasis
    -0.57
     disclaimer
    -0.56
    POSITIVE LOGITS
     involved
    0.99
     stakeholders
    0.96
     parties
    0.94
    ocating
    0.91
     sorts
    0.87
    igators
    0.86
     kinds
    0.85
     facets
    0.85
    iances
    0.83
     mankind
    0.83
    Act Density 0.079%

    No Known Activations