INDEX

Explanations

avoiding or preventing negative outcomes

The neuron detects formal instructional or policy language that specifies requirements, rules, or clarifications.

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

と共に

0.79

经历

0.75

 Vars

0.73

 طويل

0.69

infrastructure

0.66

lebr

0.66

を経て

0.66

 क्षार

0.66

 Karena

0.65

紆

0.65

POSITIVE LOGITS

 precau

0.91

 foolproof

0.88

🚫

0.79

 phrasing

0.77

 cautions

0.76

 menace

0.75

 unambiguous

0.75

 precautions

0.75

 nowadays

0.75

 evitar

0.74

Activations Density 0.710%