INDEX
Explanations
questions or prompts with a question mark at the end
questions marks in various contexts
New Auto-Interp
Negative Logits
ikuman
-0.87
referen
-0.85
corrid
-0.78
subur
-0.73
carbohyd
-0.73
earthqu
-0.71
transition
-0.70
nodd
-0.69
srf
-0.67
unintention
-0.67
POSITIVE LOGITS
Does
1.56
Answer
1.45
Are
1.44
Would
1.44
Surely
1.43
Should
1.42
How
1.41
Could
1.38
Wouldn
1.37
What
1.36
Activations Density 0.149%