INDEX

Explanations

array

np_max-act · gemini-2.0-flash

prompts and responses involving explicit sexual roleplay setups, coercive in-character instructions, and erotic dialogue contexts.

oai_token-act-pair · gpt-5 Triggered by @vetterc0

requests or exchanges that try to initiate explicit, sexualized roleplay with the assistant, often including coercive or boundary-pushing instructions.

oai_token-act-pair · gpt-5 Triggered by @vetterc0

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_15/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.15.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ครอง

-0.07

 Burb

-0.07

 shimmer

-0.06

 twee

-0.06

 Pace

-0.06

 Ginny

-0.06

แชม

-0.06

.parameters

-0.06

 hodin

-0.06

 retrieved

-0.06

POSITIVE LOGITS

:UIAlert

0.07

所以

0.07

EDIT

0.06

ैद

0.06

StartupScript

0.06

 resistance

0.06

:set

0.06

athroom

0.06

Persist

0.06

 endure

0.06

Activations Density 0.040%

array

prompts and responses involving explicit sexual roleplay setups, coercive in-character instructions, and erotic dialogue contexts.

requests or exchanges that try to initiate explicit, sexualized roleplay with the assistant, often including coercive or boundary-pushing instructions.

No Comments

No Known Activations