INDEX

Explanations

.

The neuron fires on self-referential AI ownership statements—especially first-person “I” and model identity/disclaimer phrases (e.g. “As a large language model, I do not have subjective experience”).

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

0.70

 Уже

0.65

 già

0.64

 Ambassador

0.61

 Inspired

0.61

0.60

 signé

0.58

0.57

 Signature

0.57

 സ്വന്ത

0.57

POSITIVE LOGITS

 일반적으로

0.81

𒐪

0.79

 diferentes

0.79

 algunas

0.75

 classifications

0.74

adecimal

0.74

 다음과

0.73

 genellikle

0.73

 verschillende

0.73

 forskellige

0.73

Activations Density 0.218%