INDEX
    Explanations

    instances of gratitude and appreciation expressed in relation to others

    New Auto-Interp
    Negative Logits
     aw
    -0.19
    ato
    -0.15
     ask
    -0.14
    inf
    -0.14
     category
    -0.14
    heimer
    -0.14
    .aw
    -0.14
    uala
    -0.14
     mean
    -0.13
     Hed
    -0.13
    POSITIVE LOGITS
    istrovstvÃŃ
    0.17
    üzel
    0.17
    roma
    0.15
    GuidId
    0.15
    omanip
    0.15
     ÑĢазд
    0.14
    ivals
    0.14
    θή
    0.14
    rame
    0.14
    ÅĽ
    0.14
    Act Density 0.006%

    No Known Activations