INDEX

Explanations

desirable/undesirable actions, traits, behavior

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

1.88

大丈夫

1.85

م

1.85

hearted

1.84

게

1.66

 pique

1.65

1.62

ed

1.59

1.57

 dint

1.54

POSITIVE LOGITS

 ámbitos

1.81

ري

1.77

íamos

1.74

🅐

1.66

ث

1.64

ام

1.61

ード

1.60

 mocy

1.60

st

1.59

 grados

1.58

Activations Density 0.009%