© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact

Neuronpedia

Jacobian LensNEW

Natural Language

NEW Assistant AxisNEW Circuit TracerUPDATESteer SAE Evals ExportsAPI Community Blog Privacy & Terms Contact

Home
Qwen3-1.7B
26-LLAMASCOPE-2-LORSA-16K-K64
663

INDEX

Explanations

say "Islamic terms"

unknown · unknown

New Auto-Interp

Top Features by Cosine Similarity

Embeds

Show PlotsShow ExplanationShow ActivationsShow Test FieldShow SteerShow Link

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Jefferson

-20.63

Ford

-18.38

 Knoxville

-17.75

HEN

-17.50

 Wyoming

-17.50

 Leah

-17.38

 Ford

-17.00

海棠

-16.75

Jeff

-16.63

Jackson

-16.63

POSITIVE LOGITS

伊斯兰

19.88

阿拉伯

18.25

谯

18.00

波

16.88

伊朗

16.38

米兰

16.25

 mosque

16.13

ブラ

16.13

 jihad

15.69

竞技

15.69

Activations Density 0.081%

No Known Activations