Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

possessive pronouns followed by nouns

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-2-1b-pt/resid_post/layer_13_width_16k_l0_medium

Prompts (Dashboard)

392,802 prompts, 256 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ぇ

2.29

În

2.18

În

2.15

РА

2.14

Не

2.11

ろん

2.09

ตรี

2.05

лі

2.03

 মানবতার

2.01

錢

1.96

POSITIVE LOGITS

ITY

2.67

ity

2.63

ities

2.57

ized

2.31

ität

2.26

ization

2.22

itet

2.21

izacja

2.11

IZATION

2.03

তাবাদ

2.02

Activations Density 1.780%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact