INDEX

Explanations

delusions and hallucinations

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

YON

-0.82

fseek

-0.81

IZEN

-0.81

abstractmethod

-0.78

 Poni

-0.78

轼

-0.77

ílio

-0.75

 Slat

-0.73

筆

-0.71

MERCE

-0.71

POSITIVE LOGITS

 paranoid

3.03

 paranoia

3.00

paran

2.27

 delusions

2.22

 Paran

2.00

Paran

1.89

 delusion

1.87

 hallucinations

1.82

 delusional

1.80

 halluc

1.75

Activations Density 0.054%