INDEX

Explanations

This neuron appears to specialize in detecting sexually explicit content and NSFW terminology, particularly focusing on: - Sexual acts and kinks (e.g., "breeding," "anal play," "exhibitionism") - NSFW roleplay dynamics (e.g., "freeuse," "shared use") - Slang and vulgar language tied to erotic contexts (e.g., "bimbo," "creampie") - Physical descriptors with sexual connotations (e.g., "wide hips," "full lips") It strongly activates on words and phrases commonly found in adult content, erotic roleplay scenarios, or sexually charged descriptions.

oai_token-act-pair · deepseek-v3 Triggered by @grunklewordner

New Auto-Interp

Configuration

andyrdt/saes-llama-3.1-8b-instruct/resid_post_layer_23/trainer_1

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.23.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

рап

-0.07

 cosine

-0.07

xor

-0.07

еку

-0.07

福利

-0.07

 cozy

-0.07

entered

-0.07

 fundamental

-0.06

 Parti

-0.06

 návr

-0.06

POSITIVE LOGITS

(gameObject

0.06

дан

0.06

 bella

0.06

_album

0.06

О

0.06

 proceed

0.06

●●●●●●●●

0.06

Cancel

0.06

\Services

0.06

⟩

0.06

Activations Density 0.064%

No Comments

No Known Activations

No Comments

No Known Activations