INDEX
    Explanations

    expressions of gratitude or thanks

    expressions of gratitude

    New Auto-Interp
    Negative Logits
     projecting
    -0.79
     Osc
    -0.72
     indo
    -0.66
    女
    -0.64
     unprotected
    -0.64
    inese
    -0.63
     projected
    -0.63
     diver
    -0.62
    IDER
    -0.61
     deviation
    -0.61
    POSITIVE LOGITS
    gements
    1.10
    gments
    1.05
    giving
    0.96
    ifully
    0.86
    gment
    0.85
    ments
    0.85
     acknowled
    0.84
     heavens
    0.83
    fulness
    0.81
    ees
    0.80
    Act Density 0.015%

    No Known Activations