INDEX

Explanations

cost or costs

The neuron consistently lights up on words and tokens that refer to “cost” or weights (e.g. cost, minimum cost, 負の値/コスト in Japanese, numeric cost values in code).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

cismo

-0.81

aviar

-0.79

Cz

-0.78

ServletRequest

-0.75

chase

-0.75

joje

-0.73

んでる

-0.73

高等学校

-0.72

 împ

-0.72

 திரு

-0.71

POSITIVE LOGITS

 cost

2.95

 costs

2.70

cost

2.34

Cost

2.16

 weight

2.11

 weights

2.02

 Cost

1.98

costs

1.89

 Costs

1.86

Costs

1.86

Activations Density 0.035%