INDEX
    Explanations

    references to apologies and accountability in statements

    New Auto-Interp
    Negative Logits
    .spy
    -0.13
    tier
    -0.13
    ativ
    -0.12
    duit
    -0.12
    ожеÑĤ
    -0.12
    ạnh
    -0.12
    imir
    -0.12
    orney
    -0.12
    elter
    -0.12
    é¼ĵ
    -0.12
    POSITIVE LOGITS
     apology
    0.69
     apologies
    0.66
     apolog
    0.61
     apologize
    0.60
     apologized
    0.60
     apologise
    0.54
     Ap
    0.47
     sorry
    0.46
     remorse
    0.46
     repent
    0.45
    Act Density 0.238%

    No Known Activations