INDEX
    Explanations

    words related to apologies and the concept of apologies itself

    New Auto-Interp
    Negative Logits
    leston
    -0.18
    assen
    -0.17
    ledge
    -0.16
    t
    -0.15
    ussen
    -0.15
    eenth
    -0.15
    lest
    -0.15
    pot
    -0.15
    ainty
    -0.14
    hardt
    -0.14
    POSITIVE LOGITS
     Ap
    0.20
     ap
    0.19
    -ap
    0.16
    à¤łà¤¨
    0.16
    rika
    0.16
    ooled
    0.16
    emann
    0.15
    regon
    0.15
    (ap
    0.15
    portion
    0.15
    Act Density 0.018%

    No Known Activations