INDEX
    Explanations

    actions related to legal or medical outcomes with potential negative consequences

    New Auto-Interp
    Negative Logits
    idth
    -0.79
    reci
    -0.61
    lished
    -0.60
    overty
    -0.59
    hov
    -0.58
    heny
    -0.56
    rouse
    -0.56
    icipated
    -0.55
     RPGs
    -0.55
     Peaks
    -0.54
    POSITIVE LOGITS
     anyway
    1.00
     afterwards
    0.98
     afterward
    0.96
     anyways
    0.95
     instantly
    0.93
     promptly
    0.92
     accordingly
    0.92
     unanimously
    0.91
     luckily
    0.89
     shortly
    0.88
    Act Density 0.464%

    No Known Activations