INDEX

Explanations

sentiments related to honesty, openness, or vulnerability

oai_token-act-pair · gemini-2.0-flash

New Auto-Interp

Configuration

fnlp/Llama-Scope-R1-Distill/400M-Slimpajama-400M-OpenR1-Math-220k/L21R

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

Hzfinfdu/SlimPajama-3B and open-r1/OpenR1-Math-220k

Features

32,768

Data Type

float32

Hook Name

blocks.21.hook_resid_post

Architecture

jumprelu

Context Size

1,024

Dataset

cerebras/SlimPajama-627B

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

alim

-0.07

orget

-0.07

 Hubb

-0.06

loat

-0.06

èĲ

-0.06

ÅĽcie

-0.06

.AddListener

-0.06

lendir

-0.06

åĨĮ

-0.06

addon

-0.06

POSITIVE LOGITS

 vulnerability

0.17

 honest

0.16

 honesty

0.16

 cand

0.16

 candid

0.16

raw

0.15

 Honest

0.14

 vulnerable

0.14

 vulner

0.14

 Vulner

0.14

Activations Density 0.030%

sentiments related to honesty, openness, or vulnerability

No Comments

No Known Activations