INDEX

Explanations

underlying psychological / features

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 references

0.38

 파일을

0.38

ｂ

0.37

λος

0.37

INTa

0.37

hentication

0.36

 preferring

0.36

 "/",

0.35

 preferences

0.34

 concordance

0.34

POSITIVE LOGITS

心中

0.44

心中的

0.41

sim

0.40

தீ

0.39

 importantes

0.38

van

0.38

 எல்லோ

0.38

 эмне

0.38

 viktigt

0.37

重要

0.37

Activations Density 0.001%