INDEX

Explanations

AI language model

This neuron detects the model’s self-description phrase “As a large language model” (and similar self-referential disclaimers).

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

Negative Logits

BK

0.43

 creciente

0.43

Б

0.40

 приветствую

0.39

0.38

 расту

0.38

 aumentada

0.38

 blight

0.37

 समावेश

0.37

esp

0.37

POSITIVE LOGITS

 doesn

0.45

ستانی

0.41

doesn

0.40

项

0.38

紓

0.37

വുമായി

0.37

 item

0.36

 stanu

0.36

정

0.36

 eikä

0.36

Activations Density 0.017%