INDEX
    Explanations

    emotions and their expressions, particularly those related to suffering and regret

    New Auto-Interp
    Negative Logits
    utow
    -0.17
    byt
    -0.15
    itaire
    -0.15
    imoto
    -0.15
    lak
    -0.15
     Fare
    -0.14
    ouri
    -0.14
    nez
    -0.14
    /sidebar
    -0.14
    аÑĨи
    -0.14
    POSITIVE LOGITS
    ful
    0.98
    fully
    0.79
    full
    0.78
    FUL
    0.71
    fulness
    0.69
    FULL
    0.65
    -full
    0.56
     ful
    0.52
    Full
    0.48
    eful
    0.47
    Act Density 0.079%

    No Known Activations