INDEX
    Explanations

    The neuron activates on occurrences of the word “inherit” (and its close morphological variants) in text.

    New Auto-Interp
    Negative Logits
     spo
    -0.08
     tune
    -0.08
    utdown
    -0.08
    -0.07
     box
    -0.07
     станд
    -0.07
     ölçü
    -0.07
    575
    -0.07
     bang
    -0.07
     blocked
    -0.07
    POSITIVE LOGITS
    inherit
    0.09
     inherited
    0.09
     inher
    0.07
     inherit
    0.07
     inheritance
    0.07
     Inherits
    0.07
     heritage
    0.07
    HER
    0.07
     heirs
    0.07
     inequality
    0.07
    Act Density 0.010%

    No Known Activations