INDEX
    Explanations

    expressions of gratitude towards others

    New Auto-Interp
    Negative Logits
     dominates
    -0.70
    ths
    -0.70
    Course
    -0.68
    thing
    -0.63
     Worse
    -0.63
    Temperature
    -0.60
    ighed
    -0.59
    hibition
    -0.59
    POL
    -0.59
    isation
    -0.58
    POSITIVE LOGITS
    omever
    0.86
    RIP
    0.81
     contributors
    0.74
     involved
    0.73
    Ü
    0.73
    rats
    0.73
    involved
    0.71
     congr
    0.70
     listeners
    0.70
     readers
    0.69
    Act Density 0.163%

    No Known Activations