INDEX

Explanations

tutorial hell, negative events, undesirable traits

The neuron strongly activates on the very first token of a new instructional step or sentence (i.e. the start‐of‐step/step‐number header).

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

р

0.78

цкі

0.76

 लोगो

0.75

 cargas

0.75

м

0.74

あった

0.71

то

0.71

точные

0.71

 человека

0.70

 Побе

0.69

POSITIVE LOGITS

 અમા

0.85

 assort

0.83

 empê

0.81

 sele

0.80

sul

0.79

频道

0.78

 selezione

0.77

集團

0.77

 kami

0.75

專

0.75

Activations Density 0.000%