INDEX
Explanations
instances of conversational prompts or questions
New Auto-Interp
Negative Logits
ojis
-0.16
eniable
-0.15
ön
-0.15
uhn
-0.15
uve
-0.14
Sphere
-0.14
oto
-0.13
ubi
-0.13
Sadly
-0.13
İY
-0.13
POSITIVE LOGITS
fear
0.57
Fear
0.51
Fear
0.50
fret
0.48
worry
0.46
don
0.46
don
0.41
Don
0.39
Don
0.38
worries
0.38
Activations Density 0.125%