INDEX
    Explanations

    The neuron detects tokens in Wikipedia category listings, especially the “Category:” lines at the ends of articles.

    New Auto-Interp
    Negative Logits
     Hastings
    -0.07
    .Month
    -0.06
     selects
    -0.06
    355
    -0.06
    Flexible
    -0.06
    earned
    -0.06
     safety
    -0.06
    maintenance
    -0.06
    $q
    -0.06
     далі
    -0.06
    POSITIVE LOGITS
    veget
    0.07
    の方
    0.07
     Af
    0.07
    _encoder
    0.06
    -*-
    0.06
     zurück
    0.06
    けど
    0.06
    assignment
    0.06
     haircut
    0.06
     Somebody
    0.06
    Act Density 0.018%

    No Known Activations