INDEX
    Explanations

    This neuron activates on occurrences of the word “digraph” (i.e. the substring “dig‐raph”).

    New Auto-Interp
    Negative Logits
     С
    -0.07
    enant
    -0.07
    Ease
    -0.07
    chnitt
    -0.06
     backs
    -0.06
     waitress
    -0.06
    isiert
    -0.06
     fourn
    -0.06
     insurers
    -0.06
     /\.
    -0.06
    POSITIVE LOGITS
    окумент
    0.06
    614
    0.06
    ाष
    0.06
     occult
    0.06
    Rocket
    0.06
     stochastic
    0.06
    venge
    0.06
    nginx
    0.06
     agar
    0.06
    .Remote
    0.06
    Act Density 0.004%

    No Known Activations