© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact

Neuronpedia

Natural Language

NEW Assistant AxisNEW Circuit TracerUPDATESteer SAE Evals ExportsAPI Community Blog Privacy & Terms Contact

Home
Qwen3-1.7B
27-LLAMASCOPE-2-LORSA-16K-K64
15756

INDEX

Explanations

say "sw" words

unknown · unknown

New Auto-Interp

Top Features by Cosine Similarity

Embeds

Show PlotsShow ExplanationShow ActivationsShow Test FieldShow SteerShow Link

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Evel

-22.38

 Cheryl

-20.75

oko

-20.63

eko

-20.25

 Pasta

-20.13

 Nicol

-19.63

Neh

-19.00

 Zimmerman

-18.75

 Carol

-18.63

Carol

-18.63

POSITIVE LOGITS

(sw

29.50

_sw

29.38

 Swift

27.88

sw

25.75

-sw

25.75

Swift

25.50

swagger

25.25

Sw

25.25

_SW

25.00

 swipe

24.88

Activations Density 7.105%

No Known Activations