INDEX
    Explanations

    expressions of appreciation and gratitude

    New Auto-Interp
    Negative Logits
     themselves
    -0.29
     yourselves
    -0.19
    're
    -0.18
     Ñģами
    -0.17
    Were
    -0.17
     Were
    -0.17
    ’re
    -0.17
     himself
    -0.17
    taient
    -0.16
     herself
    -0.16
    POSITIVE LOGITS
     am
    0.65
    ’m
    0.38
    'm
    0.34
     могÑĥ
    0.33
     haven
    0.32
    am
    0.29
     دارÙħ
    0.28
     Am
    0.28
    .am
    0.28
     have
    0.27
    Act Density 0.265%

    No Known Activations