INDEX
    Explanations

    words related to moral concerns and ethical dilemmas

    New Auto-Interp
    Negative Logits
    _instances
    -0.16
    VO
    -0.15
    aucoup
    -0.15
     VO
    -0.15
    itur
    -0.15
    ë
    -0.15
     Vo
    -0.14
    ourse
    -0.14
    iken
    -0.14
     vo
    -0.14
    POSITIVE LOGITS
    InvalidArgumentException
    0.15
    esto
    0.14
    athe
    0.14
    argent
    0.14
    otts
    0.14
    .cgi
    0.14
    enger
    0.14
    gren
    0.14
    eps
    0.14
    ationale
    0.14
    Act Density 0.003%

    No Known Activations