INDEX
    Explanations

    mentions of motives or reasons behind actions

    references to motives behind actions or events

    New Auto-Interp
    Negative Logits
    semble
    -0.94
    opy
    -0.84
    thumbnails
    -0.84
    alus
    -0.81
    ropolis
    -0.78
    ummer
    -0.77
    ogun
    -0.72
    hap
    -0.72
    redd
    -0.71
    mark
    -0.70
    POSITIVE LOGITS
     motives
    1.15
     motive
    1.10
     justifying
    1.06
     motivations
    1.03
     behind
    1.00
     rationale
    0.99
     motivation
    0.94
     why
    0.87
     justify
    0.87
     reasoning
    0.82
    Act Density 0.070%

    No Known Activations