INDEX

Explanations

detachment and virtues

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 enlightenment

0.56

 enlighten

0.43

ements

0.42

ambarkan

0.42

 enlight

0.42

 передви

0.42

 rhetoric

0.41

enseignement

0.40

象征

0.40

姿

0.40

POSITIVE LOGITS

 detachment

0.64

 detached

0.56

 detach

0.56

 sincere

0.54

equ

0.54

sad

0.52

SAD

0.51

DET

0.51

 steady

0.50

Equ

0.48

Activations Density 0.007%