© Neuronpedia 2026

Privacy & Terms Blog GitHub Slack Twitter Contact

Neuronpedia

Natural Language

NEW Assistant AxisNEW Circuit TracerUPDATESteer SAE Evals ExportsAPI Community Blog Privacy & Terms Contact

Home
Qwen3-1.7B
27-LLAMASCOPE-2-LORSA-16K-K64
15808

INDEX

Explanations

say "truth"

unknown · unknown

New Auto-Interp

Top Features by Cosine Similarity

Embeds

Show PlotsShow ExplanationShow ActivationsShow Test FieldShow SteerShow Link

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

资源

-17.13

痉

-17.00

符号

-16.25

机电

-16.13

sgi

-16.00

IMS

-15.94

兼职

-15.94

 symb

-15.63

 Riley

-15.56

核心区

-15.38

POSITIVE LOGITS

Truth

29.38

 Truth

28.75

 truth

28.25

truth

27.38

真理

24.75

真相

24.25

 truths

23.88

 truthful

23.50

_truth

22.75

 honesty

21.38

Activations Density 0.204%

No Known Activations