INDEX

Explanations

definitions or meanings

This neuron responds to dictionary‐entry formatting cues—things like numbered senses, part‐of‐speech tags (adj, n), bracketed or parenthesized labels, figure references, and similar lexical‐definition markers.

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

is

0.58

ipher

0.57

라는

0.57

이라는

0.57

键

0.57

 onsite

0.56

 executed

0.56

在于

0.56

之

0.56

ㅗ

0.56

POSITIVE LOGITS

 colloquial

0.85

 economici

0.74

 frase

0.71

 dinheiro

0.71

 gyak

0.70

 кг

0.69

 Türkei

0.68

 hippie

0.68

 coppia

0.67

 чаще

0.66

Activations Density 0.007%