INDEX

Explanations

thank you, please

The neuron strongly activates on polite request and gratitude phrases (e.g. “please,” “help,” “thanks,” “appreciated”).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 therefore

-1.19

所以

-1.17

 then

-1.15

so

-1.14

 HOPE

-1.05

 hope

-1.02

 departament

-1.02

 that

-1.01

-0.98

希望

-0.96

POSITIVE LOGITS

Föld

1.26

 worst

1.13

Worst

1.13

 ileti

1.09

 Worst

1.08

 vielleicht

1.07

Until

1.06

 feel

1.06

畢竟

1.06

实在是

1.05

Activations Density 0.069%