INDEX

Explanations

used for emphasis

The neuron fires on low-frequency or otherwise “unusual” tokens—things like proper names, technical jargon, and non-standard punctuation (e.g. em-dashes or quoted phrases).

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits



0.66



0.64



0.64

 درمان

0.63

Kõ

0.63

 caliper

0.63

 rmse

0.63

 Scho

0.62

attice

0.62

\}-

0.62

POSITIVE LOGITS

这句话

1.23

这话

1.13

 якобы

0.93

 implying

0.91

presumably

0.87

 কথাটা

0.86

 apparently

0.83

 offenbar

0.82

 presumably

0.81

 statement

0.80

Activations Density 0.534%