INDEX
    Explanations

    references to fire-related topics or terms

    New Auto-Interp
    Negative Logits
    zug
    -0.18
    hlas
    -0.16
    rej
    -0.16
    onet
    -0.15
    jen
    -0.15
    sk
    -0.15
    keit
    -0.14
    rij
    -0.14
    hari
    -0.14
     hors
    -0.14
    POSITIVE LOGITS
    nze
    0.25
    works
    0.24
    places
    0.22
    bird
    0.21
    walls
    0.21
    ball
    0.21
     alarm
    0.20
    work
    0.20
    fly
    0.20
    brand
    0.20
    Act Density 0.016%

    No Known Activations