INDEX
    Explanations

    actions involving explanations or reasons

    words associated with explaining or giving reasons

    New Auto-Interp
    Negative Logits
    luster
    -0.81
    assic
    -0.69
    dar
    -0.67
    rica
    -0.67
    ille
    -0.66
    sembly
    -0.65
    inates
    -0.64
    Pont
    -0.63
    itton
    -0.63
    oreal
    -0.63
    POSITIVE LOGITS
     why
    1.75
     WHY
    1.47
     how
    1.36
    why
    1.34
    how
    1.04
     HOW
    0.97
     Why
    0.97
    Why
    0.91
     what
    0.90
     away
    0.83
    Act Density 0.069%

    No Known Activations