INDEX

Explanations

language models, concern, starvation

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

墙

0.48

 walls

0.48

Walls

0.46

 boîte

0.45

 gespre

0.45

 Walls

0.44

壁

0.44

 überw

0.43

前提

0.43

Waste

0.42

POSITIVE LOGITS

alakip

0.44

these

0.44

 this

0.43

 veloce

0.43

anfaatkan

0.42

 noon

0.42

ည

0.42

 adjustable

0.41

ighted

0.41

awareness

0.41

Activations Density 0.001%