INDEX
    Explanations

    refusalI apologize for not fulfilling requestscannot comply

    New Auto-Interp
    Negative Logits
     apologies
    1.05
     apologised
    1.03
     apologized
    1.02
     apology
    1.02
     apologize
    1.01
     apologizing
    1.01
     apologise
    1.00
     apolog
    0.98
     sorry
    0.94
     Sorry
    0.94
    POSITIVE LOGITS
     satisfying
    0.44
     Satisf
    0.43
     satisfactory
    0.42
     만족
    0.41
     unsatisf
    0.39
    満足
    0.38
     satisfy
    0.37
    惊喜
    0.36
     estran
    0.36
     necessariamente
    0.35
    Act Density 0.017%

    No Known Activations