INDEX
Explanations
eliminate and step by step reasoning
New Auto-Interp
Negative Logits
selected
0.48
solution
0.48
Solution
0.46
chosen
0.43
োত্তর
0.42
arbitrarily
0.42
Selected
0.42
solution
0.41
assumes
0.41
manifold
0.41
POSITIVE LOGITS
elimination
0.71
Elim
0.67
eliminate
0.63
elimin
0.63
Eliminate
0.61
elim
0.60
elim
0.60
Elimination
0.59
elimin
0.57
Elim
0.56
Activations Density 0.030%