INDEX
    Explanations

    references to immediate actions or events, particularly those indicating urgency or direct consequence

    New Auto-Interp
    Negative Logits
    ango
    -0.15
     Mold
    -0.15
    dl
    -0.15
    encing
    -0.15
    ANGO
    -0.15
    cela
    -0.14
    pector
    -0.14
    exo
    -0.14
     arguments
    -0.14
    ιακ
    -0.14
    POSITIVE LOGITS
    485
    0.15
    adora
    0.15
    zy
    0.15
    embr
    0.15
    yan
    0.15
    869
    0.14
     Kir
    0.14
    atri
    0.14
    untu
    0.14
    516
    0.14
    Act Density 0.013%

    No Known Activations