INDEX

Explanations

emailing refunds and services

np_acts-logits-general · gemini-2.5-flash-lite

direct commands, imperative language, and assertive action-oriented discourse.

oai_token-act-pair · claude-4-5-haiku Triggered by @jamesnaruto04

Pronouns and determiners referring to entities described as performing actions perceived negatively or associated with problematic behavior, often in contexts involving complaints, scams, unprofessional conduct, or morally questionable systems.

eleuther_acts_top20 · claude-4-5-sonnet Triggered by @jamesnaruto04

New Auto-Interp

Configuration

google/gemma-scope-2-27b-it/resid_post/layer_31_width_65k_l0_medium

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 केंद्रित

0.43

 ദി

0.42

 spacetime

0.40

 pathogenesis

0.40

 inhabiting

0.40

 utilisant

0.39

Convolution

0.39

 moléculas

0.39

 ንጥረ

0.38

 evolutionary

0.38

POSITIVE LOGITS

 refund

0.70

 refunds

0.66

 emailed

0.64

 refundable

0.61

 prepaid

0.61

 email

0.57

 refunded

0.57

 Refund

0.56

 service

0.54

 invoices

0.54

Activations Density 0.295%

emailing refunds and services

direct commands, imperative language, and assertive action-oriented discourse.

Pronouns and determiners referring to entities described as performing actions perceived negatively or associated with problematic behavior, often in contexts involving complaints, scams, unprofessional conduct, or morally questionable systems.

No Comments

No Known Activations

emailing refunds and services

direct commands, imperative language, and assertive action-oriented discourse.

Pronouns and determiners referring to entities described as performing actions perceived negatively or associated with problematic behavior, often in contexts involving complaints, scams, unprofessional conduct, or morally questionable systems.

No Comments

No Known Activations