INDEX
    Explanations

    phrases associated with providing explanations or justifications

    the phrase "the reason" and variations of it

    New Auto-Interp
    Negative Logits
    KY
    -0.74
    chron
    -0.63
     Carbuncle
    -0.61
    rog
    -0.60
     helicop
    -0.60
    inav
    -0.60
    wana
    -0.59
     neighb
    -0.57
    borg
    -0.57
    ALTH
    -0.57
    POSITIVE LOGITS
     why
    1.20
    abl
    1.10
    why
    0.97
     WHY
    0.94
     behind
    0.86
    ably
    0.76
     Why
    0.74
     cited
    0.73
    Why
    0.73
    quickShipAvailable
    0.71
    Act Density 0.029%

    No Known Activations