INDEX

Explanations

No Explanations Found

New Auto-Interp

Configuration

Dataset (Dashboard)

Various

Features

131,072

Data Type

float32

Hook Name

blocks.15.hook_resid_post

Architecture

standard

Context Size

1,024

Dataset

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

ethical

-0.07

口水

-0.07

וציא

-0.07

celand

-0.07

sent

-0.07

_prime

-0.06

ﳐ

-0.06

性命

-0.06

出局

-0.06

 tentative

-0.06

POSITIVE LOGITS

 transformations

0.07

Formation

0.07

 masters

0.07

malıdır

0.07

Spa

0.07

 raster

0.07

 Watch

0.07

 rửa

0.07

Sh

0.07

Activations Density 0.023%

No Known Activations