INDEX
    Explanations

    The neuron specifically signals on occurrences of the word “afraid.”

    New Auto-Interp
    Negative Logits
     Winter
    -0.07
    ilmiştir
    -0.07
    (internal
    -0.07
    ו�
    -0.07
    uttle
    -0.07
     iter
    -0.06
     Zur
    -0.06
    silver
    -0.06
     Gerr
    -0.06
    ers
    -0.06
    POSITIVE LOGITS
     afraid
    0.14
    raid
    0.07
    said
    0.07
    .ReadFile
    0.07
     ashamed
    0.07
    ifen
    0.07
     aque
    0.07
     endPoint
    0.07
    _backend
    0.06
     readOnly
    0.06
    Act Density 0.004%

    No Known Activations