INDEX
    Explanations

    references to burning and fire-related concepts

    New Auto-Interp
    Negative Logits
     Fauc
    -0.17
    531
    -0.17
     strang
    -0.16
    ilm
    -0.16
    akt
    -0.15
    215
    -0.15
    ing
    -0.15
    aden
    -0.15
    d
    -0.15
    ve
    -0.15
    POSITIVE LOGITS
     alive
    0.28
     доÑĤ
    0.26
    ished
    0.25
    á»ijt
    0.22
    -toast
    0.22
     Alive
    0.22
    ISHED
    0.21
    alive
    0.20
     ðŁĶ
    0.20
    ishing
    0.20
    Act Density 0.037%

    No Known Activations