INDEX
    Explanations

    This neuron detects occurrences of the word “popular” and related terms referring to popular media or culture.

    New Auto-Interp
    Negative Logits
    otypical
    -0.07
    )test
    -0.07
    "]));↵
    -0.07
    дать
    -0.07
     emacs
    -0.07
    лася
    -0.06
    치는
    -0.06
    FormControl
    -0.06
    ecome
    -0.06
     Nir
    -0.06
    POSITIVE LOGITS
     popular
    0.10
     Popular
    0.07
    ,**
    0.07
    popular
    0.07
     errone
    0.07
    σφα
    0.06
     seller
    0.06
    níci
    0.06
    VT
    0.06
    Verbose
    0.06
    Act Density 0.008%

    No Known Activations