INDEX

Explanations

AI ethics and safety boundaries

This neuron fires most strongly on the distinctive, content-bearing nouns or proper names that conclude section headings or standalone lines (e.g. “Railway,” “expenses,” “Friday,” “agencies,” “skippers,” “Machinery”).

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

0.82

zione

0.75

0.73

Movie

0.73

น

0.71

tant

0.71

ت

0.71

nt

0.70

0.68

POSITIVE LOGITS

쭌

0.89

 cramping

0.88

뀨

0.86

 хоро

0.84

𝚇

0.82

 เออ

0.82

 Мы

0.81

 hãng

0.81

 inade

0.80

슌

0.79

Activations Density 0.000%