INDEX

Explanations

surprise, horror, enjoyment, delight, detriment

The neuron detects personal-emotion reaction phrases introduced by “to my/your…” (e.g. “to my surprise,” “to your horror,” “much to my delight”).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

to

-2.78

 variously

-1.59

now

-1.55

for

-1.50

 such

-1.49

 about

-1.43

 said

-1.43

not

-1.40

 either

-1.36

 sicherzustellen

-1.36

POSITIVE LOGITS

栃木

1.67

Penjelasan

1.66

1.53

Really

1.48

Какой

1.47

 สอง

1.47

 certains

1.45

 parecia

1.45

 éta

1.45

Another

1.45

Activations Density 0.008%