INDEX
    Explanations

    phrases expressing apologies

    New Auto-Interp
    Negative Logits
    tein
    -0.82
    iltration
    -0.78
    irrel
    -0.73
    arnaev
    -0.73
    insula
    -0.69
    tnc
    -0.67
     infiltration
    -0.67
     helicop
    -0.66
    minecraft
    -0.66
    ccording
    -0.65
    POSITIVE LOGITS
     sorry
    1.00
    faced
    0.81
    GES
    0.80
    sorry
    0.78
    fully
    0.74
     excuse
    0.74
     Guilty
    0.71
     Sorry
    0.70
     pardon
    0.68
    tm
    0.67
    Act Density 0.008%

    No Known Activations