INDEX

Explanations

biased or partisan

The neuron fires on language that signals external influence, bias, or corrupting financial/political motives (e.g. “influenced by the financial interests,” “corrupt nature,” “paid,” “marketing,” “biased”).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

itars

-0.99

]$,

-0.91

 изменя

-0.91

羨

-0.91

lowns

-0.88

วา

-0.88

 odleg

-0.84

ņas

-0.84

 графи

-0.83

 alcune

-0.83

POSITIVE LOGITS

 biased

1.42

 bias

1.20

bias

1.16

biased

1.14

 partisan

1.06

Bias

0.97

 Trouver

0.91

偏

0.88

hos

0.88

 biases

0.87

Activations Density 0.037%