INDEX

Explanations

Human, Assistant, agent

This neuron detects speaker/role labels and dialogue turn markers (tokens that indicate who is speaking, like "assistant", "user"/"human", character names, or bracketed turn tags).

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

りの

0.38

 HPLC

0.34

々に

0.33

 permeable

0.33

%";

0.32

 dilat

0.32

 psychosocial

0.31

 DBMS

0.31

 einer

0.31

___

0.30

POSITIVE LOGITS

Actually

0.35

uddho

0.33

Okay

0.32

 LikeLike

0.32

 తప్ప

0.32

这边

0.32

 Okay

0.31

 যাক

0.31

 โอ้

0.31

 Yeah

0.30

Activations Density 0.015%