INDEX
    Explanations

    words related to triggers or mechanisms that initiate events or actions

    references to triggers, specifically related to actions or events that can initiate a significant reaction or consequence

    New Auto-Interp
    Negative Logits
     Flavoring
    -0.81
    apolis
    -0.75
     Partnership
    -0.70
    ¬¼
    -0.69
     Correspond
    -0.68
    aredevil
    -0.67
    sm
    -0.67
    uv
    -0.67
    atography
    -0.67
    hemat
    -0.67
    POSITIVE LOGITS
    trigger
    1.40
     trigger
    1.40
     triggers
    1.28
     triggering
    1.22
     Trigger
    1.02
    Trigger
    0.91
     triggered
    0.87
     warnings
    0.85
    idon
    0.81
     derail
    0.80
    Act Density 0.008%

    No Known Activations