INDEX

Explanations

questions and informal phrasing

The neuron fires strongly on the first major word or phrase of a new segment (that is, the leading content word immediately after a <start>), effectively marking the start‐of‐segment token.

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

近年来

0.56

 செயல்பா

0.55

 વિવિધ

0.55

较为

0.55

 intricately

0.55

 számos

0.53

มีการ

0.52

各类

0.52

一系列

0.52

 culminating

0.52

POSITIVE LOGITS

 yaar

0.89

 यार

0.88

 नहीं

0.86

 değil

0.81

😘

0.81

😏

0.81

!')

0.81

!")

0.79

🖕

0.79

 😏

0.78

Activations Density 0.578%