INDEX
    Explanations

    The neuron is detecting occurrences of the word “moths.”

    New Auto-Interp
    Negative Logits
     outlet
    -0.06
     flooded
    -0.06
    okers
    -0.06
     đình
    -0.06
    android
    -0.06
    idine
    -0.06
    ój
    -0.06
    ucken
    -0.06
    طفال
    -0.06
     randint
    -0.06
    POSITIVE LOGITS
     irresponsible
    0.07
     سع
    0.07
     audits
    0.07
     canoe
    0.06
     siti
    0.06
     Степ
    0.06
     Zoom
    0.06
    Drag
    0.06
     меш
    0.06
     Atmos
    0.06
    Act Density 0.004%

    No Known Activations