INDEX

Explanations

motive

np_max-act · gemini-2.0-flash

instances of determining or speculating about the motive or cause behind a crime or action.

oai_token-act-pair · gpt-4o Triggered by @tcai

Same activations, but with all zeros filtered out: <start> robbery 0.3276 : 0.2369 determine 0.2305 a 0.7637 motive 2.1387 for 1.2744 the 0.7173 shooting 0.3594 , 0.4402 . 0.2903 This 0.2578 happened 0.4341 MB 0.6084 this 0.2445 was 0.9663 an 0.8892 isolated 0.8862 incident 0.7539 and 0.4282 haven’t 0.4282 found 0.9287 a 0.7637 motive 2.1152 for 0.9297 the 0.3113 shooting 0.3594 . 0.2323 Arundel 0.2871 responsible 0.2175 for 0.2109 the 0.3066 attack 0.5010 Arundel 0.2646 investigating 0.3206 a 0.7637 motive 2.0977 for 1.2744 the 0.7173 shooting 0.4343 <end> Explanation of neuron 4 behavior: the main thing this neuron does is find crime-reporting or investigative language cues (mentions of crime acts, incidents, motives and investigation terms).

oai_token-act-pair · o4-mini Triggered by @xinyanhu8

events related to violent crimes involving multiple victims.

oai_token-act-pair · gpt-4o-mini Triggered by @xinyanhu8

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_11/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.11.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

VIC

-0.07

갈

-0.07

 aquarium

-0.07

 Greek

-0.06

Greek

-0.06

華

-0.06

SERVICE

-0.06

urations

-0.06

 ماد

-0.06

 bedtime

-0.06

POSITIVE LOGITS

\Catalog

0.07

/power

0.06

.Client

0.06

uyệt

0.06

,''

0.06

식

0.06

订单

0.06

................................

0.06

dataType

0.06

postId

0.06

Activations Density 0.027%

motive

instances of determining or speculating about the motive or cause behind a crime or action.

events related to violent crimes involving multiple victims.

No Comments

No Known Activations

motive

instances of determining or speculating about the motive or cause behind a crime or action.

events related to violent crimes involving multiple victims.

No Comments

No Known Activations