INDEX
Explanations
question-answer pairs and question-like sentence structures
Questions
New Auto-Interp
Negative Logits
how
-0.91
How
-0.74
howto
-0.72
How
-0.71
how
-0.71
HOW
-0.69
HOW
-0.64
Bagaimana
-0.62
(
-0.59
bagaimana
-0.58
POSITIVE LOGITS
Does
0.85
Does
0.82
does
0.74
Are
0.74
Is
0.73
Will
0.72
Did
0.68
Will
0.66
Are
0.66
are
0.66
Activations Density 1.171%