INDEX
    Explanations

    phrases related to offering apologies

    New Auto-Interp
    Negative Logits
    nesota
    -0.77
    insula
    -0.71
    iltration
    -0.71
    irrel
    -0.70
    eele
    -0.68
     cryptoc
    -0.66
    arnaev
    -0.66
    tnc
    -0.66
    Ranked
    -0.66
     infiltration
    -0.64
    POSITIVE LOGITS
     sorry
    0.86
    faced
    0.78
    fully
    0.75
    GES
    0.72
    BLE
    0.71
     excuse
    0.71
     Guilty
    0.69
    sorry
    0.66
    giving
    0.66
    face
    0.66
    Act Density 0.012%

    No Known Activations