INDEX
    Explanations

    instances of destruction or harmful actions involving fire

    New Auto-Interp
    Negative Logits
     Fou
    -0.16
     Folk
    -0.15
    èħ
    -0.15
    Fraction
    -0.14
    Foo
    -0.14
     Fauc
    -0.14
     Flush
    -0.14
    fallback
    -0.14
     Fu
    -0.14
     Fool
    -0.14
    POSITIVE LOGITS
     fire
    1.03
    fire
    0.83
     Fire
    0.83
    -fire
    0.81
    Fire
    0.78
    _fire
    0.70
     fires
    0.70
    .fire
    0.70
     FIRE
    0.69
    çģ«
    0.68
    Act Density 0.117%

    No Known Activations