INDEX

Explanations

emoticons and affirmative phrases

The neuron detects tokens that are part of the model's direct factual answer or highlighted content—especially proper nouns, numbers, and emphasized/answer text.

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 teorie

0.48

 carbone

0.44

 troviamo

0.42

 tanha

0.42

 transgress

0.41

 théorie

0.40

 cynicism

0.40

 rebellion

0.40

 exacerbate

0.39

 transgression

0.39

POSITIVE LOGITS

:)

0.59

:)

0.53

:-)

0.52

 😊

0.51

it

0.49

;)

0.48

or

0.48

不过

0.47

៕

0.47

 🙂

0.47

Activations Density 0.654%