INDEX
Explanations
offering elaborations or alternatives
New Auto-Interp
Negative Logits
procedures
0.89
↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
0.86
↵↵↵↵↵↵↵↵
0.85
↵↵↵↵↵↵↵
0.85
↵↵↵↵↵↵↵↵↵
0.85
↵↵↵↵↵↵↵↵↵↵↵↵↵↵
0.84
↵↵↵↵↵↵↵↵↵↵↵↵↵
0.84
↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
0.84
↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
0.84
↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
0.84
POSITIVE LOGITS
Note
1.56
Edit
1.55
EDIT
1.39
Alternatively
1.36
Bonus
1.34
To
1.33
edit
1.27
PS
1.25
Also
1.23
NOTE
1.20
Activations Density 0.140%