INDEX

Explanations

bribe, bribes, bribery

The main thing this neuron does is detect mentions of bribery or corruption payments (e.g., “bribe,” “bribing,” “bribes,” “kickbacks”).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

-3.05

-3.00

-2.84

-2.80

-2.78

-2.75

<em>

-2.67

is

-2.55

ch

-2.48

']

-2.45

POSITIVE LOGITS

 花纹

2.34

潆

2.23

歘

2.20

鉏

2.16

籥

2.13

 dijeron

2.09

颏

2.09

お勧め

2.06

pág

2.05

щество

2.03

Activations Density 0.004%