INDEX

Explanations

but or and

The neuron is tuned to words that convey speaker stance or evaluation—i.e. modals and adverbs expressing certainty, possibility, or emphasis (hedges/epistemic markers).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 kahit

-0.88

Ret

-0.87

 good

-0.85

positorio

-0.84

$*$

-0.81

modes

-0.76

any

-0.75

 模式

-0.74

 same

-0.74

implicitly

-0.73

POSITIVE LOGITS

but

2.14

 لكن

1.33

 แต่

1.26

 nhưng

1.20

 लेकिन

1.17

 лишь

1.14

 ولی

1.13

 αλλά

1.13

 فقط

1.13

 ancak

1.05

Activations Density 0.053%