INDEX
Explanations
This neuron detects occurrences of the word “popular” and related terms referring to popular media or culture.
New Auto-Interp
Negative Logits
otypical
-0.07
)test
-0.07
"]));↵
-0.07
дать
-0.07
emacs
-0.07
лася
-0.06
치는
-0.06
FormControl
-0.06
ecome
-0.06
Nir
-0.06
POSITIVE LOGITS
popular
0.10
Popular
0.07
,**
0.07
popular
0.07
errone
0.07
σφα
0.06
seller
0.06
níci
0.06
VT
0.06
Verbose
0.06
Activations Density 0.008%