Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

states a contradiction

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-2-1b-pt/resid_post/layer_13_width_16k_l0_medium

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

،

4.26

3.89

、

3.37

、「

3.11

®,

2.77

,—

2.77

、“

2.70

-,

2.66

ّ

2.62

2.58

POSITIVE LOGITS

 yani

2.64

 yada

2.56

ie

2.39

 보면은

2.38

 which

2.38

 które

2.35

 wobei

2.35

 namely

2.34

 atleast

2.33

 albeit

2.33

Activations Density 2.589%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact