happy greetings and phrases

The neuron primarily detects the token "happy" (and close variants/uses of "Happy") — i.e., expressions of happiness/positive greetings.

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

0.85

0.76

0.74

ক

0.74

0.72

0.70

0.69

ف

0.69

0.68

POSITIVE LOGITS

 happy

0.83

 feliz

0.79

 Happy

0.75

 mutlu

0.66

 felices

0.61

 felicidad

0.60

happy

0.60

 happier

0.60

 bahagia

0.59

 HAPPY

0.57

Activations Density 0.021%