INDEX

Explanations

frankly and honestly

New Auto-Interp

Configuration

Prompts (Dashboard)

24,576 prompts, 128 tokens each

Dataset (Dashboard)

monology/pile-uncopyrighted

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 contextes

-0.83

今は

-0.83

など

-0.82

 prawie

-0.81

玦

-0.78

 megfelelő

-0.77

 कैसी

-0.76

 quartiers

-0.75

 たく

-0.75

こちらは

-0.75

POSITIVE LOGITS

 honestly

4.06

tbh

3.59

 frankly

3.42

honestly

3.03

Honestly

3.00

 Honestly

2.97

 truth

2.95

TB

2.89

to

2.80

Frankly

2.59

Activations Density 0.076%