INDEX

Explanations

should probably advise

The neuron fires on hedging advice phrased as “should probably.”

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

-1.71

 such

-1.71

 \%$,

-1.65

-1.60

-1.52

“

-1.49

 after

-1.43

 since

-1.35

 with

-1.33

POSITIVE LOGITS



2.00

 ビキニ

1.88

 レスト

1.67

 グラビア

1.66

＇

1.62

寕

1.60

 手绘

1.58

ícias

1.57

嫪

1.57

 チラ

1.56

Activations Density 0.013%