INDEX
    Explanations

    The neuron fires on words denoting close familial roles (e.g. “mother,” “daughter”).

    New Auto-Interp
    Negative Logits
    ARE
    -0.07
    _head
    -0.07
     graduate
    -0.06
     scor
    -0.06
     Johnson
    -0.06
     Sandwich
    -0.06
     الدول
    -0.06
    -0.06
    _cum
    -0.06
     apples
    -0.06
    POSITIVE LOGITS
     Kunst
    0.07
     Familie
    0.06
    0.06
     %(
    0.06
     оказ
    0.06
    0.06
    eyi
    0.06
     KD
    0.06
    .onError
    0.06
    -Pro
    0.06
    Act Density 0.013%

    No Known Activations