INDEX

Explanations

assumptions and premises

The neuron detects discourse markers that introduce or assert assumptions and premises (e.g. words like “assume,” “assumption,” “premise,” “conceding,” “for granted,” and similar logical‐argument connectors).

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

zeption

-0.91

iddhartha

-0.90

edoria

-0.86

 шанс

-0.86

 despite

-0.85

箐

-0.84

iconductor

-0.82

 רג

-0.82

Politik

-0.82

เวลา

-0.82

POSITIVE LOGITS

 premise

1.46

 assumption

1.34

 assumed

1.21

 assumptions

1.13

 baseline

1.11

前提

1.05

assumed

1.05

 asume

1.04

 established

1.02

assume

1.02

Activations Density 0.065%