INDEX
    Explanations

    This neuron detects occurrences of the concept of anonymity, especially the word “anonymous” and closely related contexts.

    New Auto-Interp
    Negative Logits
     Well
    -0.08
     well
    -0.08
     Work
    -0.07
    Relation
    -0.07
    -hard
    -0.06
    (@(
    -0.06
    table
    -0.06
     Building
    -0.06
    атку
    -0.06
     경기
    -0.06
    POSITIVE LOGITS
     Anonymous
    0.09
     anonymous
    0.09
    anonymous
    0.09
     anonymously
    0.08
     anonymity
    0.08
    AllowAnonymous
    0.08
     anon
    0.08
    mon
    0.07
    Anonymous
    0.07
    -os
    0.07
    Act Density 0.004%

    No Known Activations