INDEX

Explanations

phrases related to lying or deception

oai_token-act-pair · gpt-3.5-turbo

New Auto-Interp

Configuration

neuronpedia/gpt2-small__res_scl-ajt/6-res_scl-ajt

Prompts (Dashboard)

12,288 prompts, 128 tokens each

Dataset (Dashboard)

Skylion007/openwebtext

Features

46,080

Data Type

torch.float32

Hook Point

blocks.6.hook_resid_pre

Architecture

standard

Context Size

128

Dataset

apollo-research/Skylion007-openwebtext-tokenizer-gpt2

Hook Point Layer

Activation Function

relu

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 largeDownload

-0.75

eric

-0.73

ibo

-0.70

zens

-0.69

pour

-0.67

urable

-0.67

oval

-0.66

temp

-0.66

Interstitial

-0.65

Lago

-0.65

POSITIVE LOGITS

 omission

1.12

 falsely

0.77

 misrepresent

0.73

 dece

0.72

 attribut

0.71

 accusation

0.70

 deceive

0.69

 accusations

0.68

 false

0.67

 mistaken

0.67

Activations Density 21.597%

phrases related to lying or deception

No Comments

No Known Activations