INDEX
    Explanations

    words and phrases related to apologies and admitting mistakes

    New Auto-Interp
    Negative Logits
    arga
    -0.16
    908
    -0.15
    IPP
    -0.14
    ожеÑĤ
    -0.14
    ahlen
    -0.14
    ¤íĶĦ
    -0.14
    Alarm
    -0.14
    é¼ĵ
    -0.14
    Reuse
    -0.13
    .spy
    -0.13
    POSITIVE LOGITS
     apology
    0.58
     apologies
    0.54
     apolog
    0.51
     apologize
    0.50
     apologized
    0.49
     apologise
    0.46
     Ap
    0.44
     sorry
    0.42
    Ap
    0.40
     remorse
    0.38
    Act Density 0.396%

    No Known Activations