INDEX
    Explanations

    apologies or statements of regret

    New Auto-Interp
    Negative Logits
    ccording
    -0.84
    eele
    -0.80
    edience
    -0.78
    eely
    -0.73
    irrel
    -0.72
    inct
    -0.72
    cffff
    -0.71
    kefeller
    -0.71
    weeney
    -0.70
    hig
    -0.68
    POSITIVE LOGITS
     guys
    0.97
     sorry
    0.96
     folks
    0.87
     excuse
    0.83
     :(
    0.82
     about
    0.80
     ladies
    0.79
     sir
    0.79
    fully
    0.78
     bout
    0.78
    Act Density 0.020%

    No Known Activations