Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

identifies things as

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-2-4b-it/resid_post/layer_22_width_65k_l0_medium

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

︾

0.70

 Gemeind

0.67

 આજે

0.66

 जिनमें

0.66

ര്‍ഷ

0.65

 дуже

0.65

 allons

0.65

ྕ

0.65

 sampe

0.64

 위해서는

0.64

POSITIVE LOGITS

as

4.44

 sebagai

4.42

作为

3.98

 作为

3.66

作為

3.62

 jako

3.53

 Sebagai

3.34

 ως

3.32

作为一个

3.21

als

2.96

Activations Density 0.884%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact