INDEX
    Explanations

    mentions of physical harm or destruction by fire

    references to fire, burning, and related injuries or destruction

    New Auto-Interp
    Negative Logits
    onsense
    -0.75
    ournal
    -0.74
     reluct
    -0.71
    awaru
    -0.69
    egal
    -0.68
    udeau
    -0.67
    ensical
    -0.66
    alian
    -0.66
    ortun
    -0.66
    remlin
    -0.65
    POSITIVE LOGITS
     burning
    1.14
     burn
    1.12
    ished
    1.04
     burns
    1.04
    ishing
    1.01
     hotter
    0.99
    burning
    0.98
     burned
    0.95
    ishes
    0.89
     burner
    0.86
    Act Density 0.018%

    No Known Activations