INDEX

Explanations

jealousy, possessiveness, protectiveness

New Auto-Interp

Configuration

Prompts (Dashboard)

238,145 prompts, 512 tokens each

Dataset (Dashboard)

lmsys + oasst1

Embeds

IFrame

Link

Not in Any Lists

No Comments

Negative Logits

 dissipation

0.46

碚

0.39

 stunned

0.37

laugh

0.37

ध्यात्म

0.36

BCH

0.36

muted

0.36

 gian

0.36

CW

0.35

grey

0.35

POSITIVE LOGITS

 jealous

1.51

 jealousy

1.47

 Jealous

1.34

 possess

1.13

嫉

1.06

 protect

1.05

 protective

1.05

 Protective

0.98

protective

0.97

 Protect

0.96

Activations Density 0.049%