INDEX
Explanations
questions and responses from a structured conversation
question formats and references to inquiries or prompts
New Auto-Interp
Negative Logits
revel
-0.70
angering
-0.69
outweigh
-0.69
bloom
-0.68
doomed
-0.68
utton
-0.68
aden
-0.67
overshadow
-0.66
stakes
-0.66
culmin
-0.65
POSITIVE LOGITS
Hello
1.35
Hi
1.30
Hi
1.28
Hello
1.25
reetings
1.14
Hey
1.08
Question
1.06
Dear
1.05
Hey
1.03
Okay
1.02
Activations Density 0.138%