INDEX
    Explanations

    This neuron activates on seemingly random sets of words, and doesn't seem to have a clear function

    New Auto-Interp
    Negative Logits
     ra
    -0.63
     Ră
    -0.50
    RAD
    -0.47
     Rad
    -0.46
     RAD
    -0.44
     Ra
    -0.44
     Rav
    -0.43
    ########.
    -0.43
     rad
    -0.43
     ram
    -0.43
    POSITIVE LOGITS
    InstrumentedTest
    0.69
     esternos
    0.68
    ogaster
    0.63
    CreateIndex
    0.60
    })`
    0.60
     enfans
    0.59
    SourceChecksum
    0.59
     vorbehalten
    0.59
    setViewName
    0.59
    <bos>
    0.59
    Act Density 1.407%

    No Known Activations