© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact

Neuronpedia

Natural Language

NEW Assistant AxisNEW Circuit TracerUPDATESteer SAE Evals ExportsAPI Community Blog Privacy & Terms Contact

Home
Qwen3-1.7B
27-LLAMASCOPE-2-LORSA-16K-K64
16279

INDEX

Explanations

say "Harry Potter characters"

unknown · unknown

New Auto-Interp

Top Features by Cosine Similarity

Embeds

Show PlotsShow ExplanationShow ActivationsShow Test FieldShow SteerShow Link

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

脾气

-16.38

\/\/

-15.69

街

-15.06

/layouts

-15.00

的男人

-14.88

>\<

-14.56

的衣服

-14.31

大街

-14.13

 conduc

-14.00

街区

-14.00

POSITIVE LOGITS

 phoenix

22.75

哈利

22.63

 Hermione

22.50

TCL

22.50

_lm

22.25

LM

22.25

 Draco

21.75

QS

20.88

UIS

20.88

LM

20.88

Activations Density 0.347%

No Known Activations