INDEX

Explanations

witness

New Auto-Interp

Configuration

Prompts (Dashboard)

16,384 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

PlotsExplanationShow Test FieldDefault Test Text

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 Resident

-0.29

 Robbins

-0.26

 resident

-0.26

icas

-0.25

åľį

-0.25

lus

-0.24

åĽ´

-0.24

 enth

-0.24

 Cliff

-0.24

icao

-0.24

POSITIVE LOGITS

è¯ģ

0.37

proof

0.34

è§ģè¯ģ

0.34

 proof

0.33

ç®¡çĲĨå±Ģ

0.31

iÃ©

0.30

èŃī

0.30

éģĩè§ģ

0.27

éģĩåĪ°

0.27

 witnessing

0.27

Activations Density 0.946%