INDEX
Explanations
numerical data and references
New Auto-Interp
Negative Logits
-↵
-0.23
-
-0.20
âĢIJ
-0.20
...↵
-0.19
...'↵
-0.18
(...)↵
-0.18
<strong
-0.17
..."↵
-0.17
...)↵
-0.17
âĢij
-0.16
POSITIVE LOGITS
_
0.30
esl
0.30
essay
0.29
Essay
0.28
(_
0.28
dissertation
0.24
--
0.24
essays
0.23
Dissertation
0.23
.--
0.23
Activations Density 0.031%