INDEX
    Explanations

    lists of categories

    The neuron activates on tokens appearing in Wikipedia “Category:” lines.

    New Auto-Interp
    Negative Logits
     grams
    -0.07
    hei
    -0.07
    ама
    -0.07
    .environment
    -0.07
    /tag
    -0.07
    official
    -0.06
     unst
    -0.06
    [at
    -0.06
     docs
    -0.06
     Embed
    -0.06
    POSITIVE LOGITS
    最后
    0.07
    _DEVICES
    0.07
    \Field
    0.07
     대한
    0.06
    \L
    0.06
    ัฒนา
    0.06
     podnikatel
    0.06
    Opened
    0.06
     náro
    0.06
     subway
    0.06
    Act Density 0.055%

    No Known Activations