INDEX

Explanations

The neuron flags strong profanity—especially multi‐word or intensified swears (e.g. “God damn,” “fuck,” “cunt”)—marking when highly offensive curse phrases occur.

discussions of profanity and offensive speech, including meta-talk about speaking style and advice or warnings around using such language.

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 운영

0.89

🔬

0.87

📈

0.84

🧪

0.83

 bioinformatics

0.82

 DeFi

0.78

ipynb

0.74

 பணிகள்

0.74

 OpenGL

0.73

📊

0.73

POSITIVE LOGITS

 utterances

2.67

 utterance

2.61

 phrases

2.54

 frases

2.33

 verbal

2.30

 phrasing

2.29

 speech

2.27

 uttered

2.26

 phrase

2.18

 words

2.15

Activations Density 2.924%