INDEX
    Explanations

    The neuron activates on proper nouns—names of people, brands, models, etc.

    New Auto-Interp
    Negative Logits
     Viktor
    -0.07
     шкі
    -0.07
    -0.06
    integral
    -0.06
     weniger
    -0.06
     architect
    -0.06
     RTS
    -0.06
    ріш
    -0.06
    -0.06
     kry
    -0.06
    POSITIVE LOGITS
     Amateur
    0.06
    (op
    0.06
     vrouw
    0.06
     gdzie
    0.06
     druhé
    0.06
    .choice
    0.06
    _COOKIE
    0.06
    hub
    0.06
    .bs
    0.06
    ,[
    0.06
    Act Density 0.387%

    No Known Activations