© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact

Neuronpedia

Natural Language

NEW Assistant AxisNEW Circuit TracerUPDATESteer SAE Evals ExportsAPI Community Blog Privacy & Terms Contact

Home
Qwen3-1.7B
26-LLAMASCOPE-2-LORSA-16K-K64
496

INDEX

Explanations

say "race"

unknown · unknown

New Auto-Interp

Top Features by Cosine Similarity

Embeds

Show PlotsShow ExplanationShow ActivationsShow Test FieldShow SteerShow Link

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

HK

-20.00

jsp

-17.88

NV

-17.25

HK

-16.63

EPS

-16.38

gz

-16.25

jon

-16.13

桂

-16.13

mdl

-16.00

MG

-15.81

POSITIVE LOGITS

种族

36.00

 racial

28.38

 racially

27.75

 race

26.63

rac

24.63

racial

24.50

Race

24.38

Rac

24.38

 races

23.63

race

23.38

Activations Density 0.245%

No Known Activations