Neuronpedia

APIAssistant AxisNEW Circuit TracerNEW Steer SAE Evals Exports Community Blog Privacy & Terms Contact

INDEX

Explanations

life threatening

np_acts-logits-general · gemini-2.5-flash-lite

New Auto-Interp

Configuration

google/gemma-scope-2-27b-it/resid_post/layer_53_width_262k_l0_medium

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 serendip

0.41

踌

0.40

Hes

0.39

രണം

0.38

 DISCLAIM

0.38

াজী

0.37

romptu

0.37

ilage

0.36

唏

0.36

ahl

0.36

POSITIVE LOGITS

 threat

4.03

Threat

3.86

 threats

3.84

threat

3.73

 Threat

3.67

 Threats

3.61

威胁

3.55

 threaten

3.53

 amea

3.42

 threatening

3.31

Activations Density 0.068%

No Known Activations

© Neuronpedia 2025

Privacy & Terms Blog GitHub Slack Twitter Contact