INDEX
    Explanations

    events or actions that lead to significant consequences or outcomes

    New Auto-Interp
    Negative Logits
     mercy
    -0.63
    entric
    -0.61
     symmetry
    -0.60
    aves
    -0.60
    avorite
    -0.59
    arest
    -0.58
     loopholes
    -0.58
     Vaughn
    -0.57
    irlf
    -0.56
    afort
    -0.56
    POSITIVE LOGITS
    better
    0.95
    gers
    0.94
    ership
    0.81
    hunt
    0.77
    uez
    0.75
     nowhere
    0.74
    wig
    0.74
    ges
    0.74
    -+
    0.74
    bare
    0.72
    Act Density 0.386%

    No Known Activations