INDEX

Explanations

open and honest communication

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

支援

0.43

 frantic

0.40

 photovoltaic

0.40

 tuples

0.39

Vid

0.38

 nginx

0.38

 motivation

0.38

PV

0.38

 enticing

0.38

眺

0.37

POSITIVE LOGITS

 honest

0.80

honest

0.73

 honesty

0.71

 truthful

0.68

 truthfully

0.67

 openly

0.67

 Honest

0.66

 unpopular

0.65

 vérit

0.64

实话

0.63

Activations Density 0.123%