INDEX
    Explanations

    expressions of empathy and sympathy

    New Auto-Interp
    Negative Logits
    acco
    -0.20
    aler
    -0.18
    ebi
    -0.17
    unar
    -0.15
    IFICATION
    -0.15
     dök
    -0.15
    ebo
    -0.15
    kl
    -0.15
    antino
    -0.15
    URE
    -0.14
    POSITIVE LOGITS
    etic
    0.36
    etically
    0.35
    izing
    0.35
    izers
    0.35
    ies
    0.33
    ize
    0.32
    izer
    0.30
    ized
    0.29
    izes
    0.28
    etical
    0.27
    Act Density 0.015%

    No Known Activations