INDEX

Explanations

handle with care, caution, precautions

The main thing this neuron does is detect precautionary or warning instructions—phrases that advise caution, care, or regulatory disclaimers.

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

remos

-0.85

nip

-0.82

acked

-0.81

ЛУ

-0.76

ografía

-0.74

ję

-0.73

🏆

-0.73

ԁ

-0.73

📩

-0.71

Truth

-0.71

POSITIVE LOGITS

 care

5.31

 caution

4.03

care

3.41

 Care

3.38

 cuidado

3.31

Care

3.08

 careful

2.98

 precautions

2.91

 CARE

2.69

 precaution

2.69

Activations Density 0.082%