INDEX

Circuit Sparsity

No connected neurons data available

Explanations

definition of string variable (negative activations)

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 256 tokens each

Dataset (Dashboard)

neuronpedia/python-code-simplified

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

lot

-0.08

ub

-0.04

__

-0.03

 representing

-0.03

.m

-0.03

.e

-0.02

 that

-0.02

 given

-0.02

 each

-0.02

the

-0.02

POSITIVE LOGITS

 best

0.09

arch

0.08

.random

0.08

 random

0.07

random

0.06

(self

0.05

ateg

0.05

 sele

0.05

valu

0.05

bab

0.04

Activations Density 95.001%