INDEX

Explanations

thresholds and numerical comparisons

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 hazırl

0.36

 görev

0.34

ങ്ങളും

0.34

猜测

0.33

снов

0.33

壶

0.33

Premi

0.33

Ticker

0.32

 Павел

0.32

 సంగ

0.32

POSITIVE LOGITS

 threshold

1.02

 thresholds

0.95

threshold

0.92

 Threshold

0.80

>=

0.79

≤

0.75

Threshold

0.75

≥

0.74

(>

0.71

$(<

0.70

Activations Density 0.262%