INDEX
    Explanations

    phrases related to excuses or justifications

    New Auto-Interp
    Negative Logits
    semble
    -0.92
    ropolitan
    -0.76
    erial
    -0.75
    opy
    -0.75
    ymph
    -0.73
    marks
    -0.72
    ropolis
    -0.72
    opers
    -0.70
    efully
    -0.70
    mark
    -0.70
    POSITIVE LOGITS
     excuse
    1.08
     justifying
    1.04
     WHY
    1.00
     excuses
    0.96
     explanations
    0.96
     explanation
    0.94
     rationale
    0.93
     why
    0.91
     explaining
    0.89
     justification
    0.88
    Act Density 0.129%

    No Known Activations