INDEX
    Explanations

    phrases and terms related to justification, particularly in relation to moral or ethical behavior

    New Auto-Interp
    Negative Logits
    ÙĦاÙĨ
    -0.15
    alama
    -0.15
    jac
    -0.15
    ichel
    -0.14
    λαν
    -0.14
    _DEPEND
    -0.14
    _DEPRECATED
    -0.14
    endi
    -0.14
    åħĥ
    -0.13
    iw
    -0.13
    POSITIVE LOGITS
     why
    0.23
     justify
    0.22
     justification
    0.20
    why
    0.18
     excuse
    0.18
     reasons
    0.17
    justify
    0.16
     rationale
    0.16
    ably
    0.16
     justified
    0.16
    Act Density 0.057%

    No Known Activations