INDEX
    Explanations

    The neuron activates on occurrences of the word “endogenous.”

    New Auto-Interp
    Negative Logits
    cut
    -0.07
     Marriage
    -0.07
    HELL
    -0.07
     گفت
    -0.07
    ували
    -0.07
     flame
    -0.07
     Strike
    -0.07
    -0.07
    -0.06
     parole
    -0.06
    POSITIVE LOGITS
    ademic
    0.07
     unreasonable
    0.07
    {})
    0.07
     apolog
    0.06
    уль
    0.06
     SAS
    0.06
    	register
    0.06
    _DH
    0.06
    .'),↵
    0.06
     ath
    0.06
    Act Density 0.002%

    No Known Activations