INDEX

Explanations

checking for "hello" or "hi"

This neuron isn’t picking out any particular word type or meaning—rather it strongly activates for tokens appearing very late in the input sequence, essentially marking “far‐right” (end‐of‐context) positions.

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

1.01

0.90

0.85

0.81

 attraction

0.78

 bagels

0.78

 attractions

0.77

 cigars

0.76

0.75

 country

0.74

POSITIVE LOGITS

vaient

0.91

 személy

0.85

 estratégica

0.82

 decía

0.81

étais

0.80

 sostenibilidad

0.79

setPreferred

0.79

𓏸

0.79

 persön

0.78

Quieres

0.77

Activations Density 0.000%