INDEX

Explanations

fictional AI assistant

np_acts-logits-general · gemini-2.5-flash-lite

instructions and formatting cues related to chatbot roleplay/meta-conversation, especially second-person prompts addressing an AI and assistant-style header markers.

oai_token-act-pair · gpt-5 Triggered by @vetterc0

New Auto-Interp

Top Features by Cosine Similarity

Comparing With LLAMA3.3-70B-IT @ 50-resid-post-gf

Configuration

Goodfire/Llama-3.3-70B-Instruct-SAE-l50/Llama-3.3-70B-Instruct-SAE-l50.pt

Prompts (Dashboard)

10,000 prompts, 128 tokens each

Dataset (Dashboard)

lmsys/lmsys-chat-1m

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Heller

-0.10

 Kling

-0.09

angi

-0.09

rs

-0.09

instr

-0.09

 Gors

-0.09

warts

-0.09

 mutate

-0.09

-0.08

POSITIVE LOGITS

AI

0.16

AI

0.13

 artificial

0.12

 entity

0.11

 system

0.11

 machine

0.11

ai

0.11

 interface

0.10

scp

0.10

Override

0.10

Activations Density 0.294%

fictional AI assistant

instructions and formatting cues related to chatbot roleplay/meta-conversation, especially second-person prompts addressing an AI and assistant-style header markers.

No Comments

No Known Activations

fictional AI assistant

instructions and formatting cues related to chatbot roleplay/meta-conversation, especially second-person prompts addressing an AI and assistant-style header markers.

No Comments

No Known Activations