INDEX
    Explanations

    excuses and justifications for actions or behaviors, particularly in relation to blame or justification

    New Auto-Interp
    Negative Logits
    .opensource
    -0.16
    isay
    -0.16
    aises
    -0.15
    eldon
    -0.15
    egra
    -0.14
    íģ
    -0.14
    ÅĻe
    -0.14
    ÑĨенÑĤÑĢа
    -0.14
    fred
    -0.14
    oÅĽci
    -0.14
    POSITIVE LOGITS
    oton
    0.16
     tup
    0.16
     Ulus
    0.16
     excuse
    0.16
    jit
    0.15
     éģĵ
    0.15
     arguments
    0.15
    arg
    0.15
    ulu
    0.15
     justification
    0.14
    Act Density 0.126%

    No Known Activations