INDEX
    Explanations

    expressions of gratitude or thankfulness

    New Auto-Interp
    Negative Logits
    oto
    -0.73
    smoking
    -0.72
    agate
    -0.69
    calling
    -0.68
    opers
    -0.66
    uter
    -0.66
    change
    -0.66
    improve
    -0.65
    inic
    -0.65
    dump
    -0.64
    POSITIVE LOGITS
     acknowled
    1.03
    giving
    0.97
     citiz
    0.92
     pardon
    0.91
    gements
    0.83
     acknowledgment
    0.78
     gratitude
    0.76
    FUL
    0.74
    NESS
    0.74
    ledged
    0.74
    Act Density 9.741%

    No Known Activations