INDEX
    Explanations

    expression of regret or apology

    expressions of apology or regret

    New Auto-Interp
    Negative Logits
     Allied
    -0.74
     Bourbon
    -0.73
     Rhodes
    -0.72
     Tripoli
    -0.70
     mids
    -0.68
     RAF
    -0.67
     civilian
    -0.67
     JPEG
    -0.67
     Cincinnati
    -0.66
     Flickr
    -0.66
    POSITIVE LOGITS
    cause
    1.05
    ï¸ı
    1.03
    mean
    0.97
    âĶĢâĶĢâĶĢâĶĢ
    0.93
    thing
    0.93
    shall
    0.92
    agree
    0.92
    want
    0.92
    laughs
    0.90
    exist
    0.90
    Act Density 0.207%

    No Known Activations