INDEX

Explanations

means or implies

The neuron fires on formal second-person address (you/your/Users) in policy or terms‐of‐service style statements.

New Auto-Interp

Configuration

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

 crept

0.65

 още

0.59

 लागू

0.59

 ससु

0.59

얇

0.59

垂

0.57

 sprouted

0.57

🄰

0.57

 ferret

0.57

角落

0.57

POSITIVE LOGITS

必定

0.79

 must

0.77

 guarantees

0.77

 implies

0.74

 involves

0.73

 подразуме

0.73

 means

0.72

必然

0.71

you

0.70

 bedeutet

0.70

Activations Density 0.593%