INDEX
Explanations
questions and prompts in the text
questions about definitions and explanations of concepts
New Auto-Interp
Negative Logits
scarcely
-0.70
."[
-0.67
akable
-0.65
efully
-0.64
,[
-0.63
_>
-0.62
.</
-0.61
thought
-0.60
tur
-0.60
thereafter
-0.59
POSITIVE LOGITS
?:
1.12
?
0.91
?
0.88
Use
0.83
)?
0.81
'?
0.79
Funds
0.78
.?
0.78
USE
0.77
Supported
0.77
Activations Density 0.238%