INDEX

Explanations

reading bytes, code snippets, or processes

The neuron is primarily detecting positive, promotional adjectives (e.g. “cool,” “easy,” “fun,” “revealing”) used to praise or endorse content.

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 karate

0.97

 nihil

0.96

ыр

0.89

 awakening

0.89

 cabaret

0.89

 racist

0.88

 Cartagena

0.88

 Barça

0.87

 onslaught

0.86

 colegio

0.85

POSITIVE LOGITS

Portable

0.82

Introducing

0.81

Rad

0.79

Deep

0.75

Embedded

0.75

Hi

0.74

Simple

0.71

Break

0.71

anu

0.71

Special

0.70

Activations Density 0.000%