INDEX
    Explanations

    phrases related to negative outcomes or consequences

    instances of the word "fire" and its variations in various contexts

    New Auto-Interp
    Negative Logits
    meric
    -0.85
    anson
    -0.81
    sembly
    -0.75
    sie
    -0.74
     Redmond
    -0.74
    omo
    -0.72
     Citizen
    -0.71
    eston
    -0.68
    VIDIA
    -0.67
     Gutenberg
    -0.65
    POSITIVE LOGITS
    flies
    1.13
    fly
    0.97
    lda
    0.81
    fighter
    0.80
    proof
    0.78
    storm
    0.77
    ricanes
    0.75
     hotter
    0.74
     extingu
    0.73
    fighters
    0.72
    Act Density 0.016%

    No Known Activations