© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact

Neuronpedia

Natural Language

NEW Assistant AxisNEW Circuit TracerUPDATESteer SAE Evals ExportsAPI Community Blog Privacy & Terms Contact

Home
Qwen3-1.7B
26-LLAMASCOPE-2-LORSA-16K-K64
8

INDEX

Explanations

say "India"

unknown · unknown

New Auto-Interp

Top Features by Cosine Similarity

Embeds

Show PlotsShow ExplanationShow ActivationsShow Test FieldShow SteerShow Link

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

威尔

-19.25

邮轮

-19.13

 Ludwig

-18.88

美国

-17.63

.lu

-17.50

奥地利

-17.25

 Wolfgang

-16.88

 Juan

-16.88

保罗

-16.88

 Dawson

-16.75

POSITIVE LOGITS

印度

70.50

 Indian

67.50

 India

66.50

India

61.50

 Delhi

59.50

Indian

59.50

 Hindu

58.00

 Hindi

56.00

 Indians

55.00

 Bollywood

54.75

Activations Density 2.821%

No Known Activations