INDEX
Explanations
numerical data or references in a scholarly context
New Auto-Interp
Negative Logits
82
-0.16
.Widget
-0.15
pre
-0.15
867
-0.14
91
-0.14
recently
-0.14
sustain
-0.14
83
-0.14
re
-0.13
c
-0.13
POSITIVE LOGITS
101
0.41
109
0.32
100
0.31
108
0.29
110
0.29
111
0.28
103
0.27
338
0.27
339
0.27
117
0.25
Activations Density 0.010%