INDEX
Explanations
repeated use of the apostrophe
New Auto-Interp
Negative Logits
"
-0.31
)
-0.30
'
-0.30
:
-0.29
.
-0.29
,
-0.25
↵
-0.22
)↵
-0.22
-
-0.22
]
-0.21
POSITIVE LOGITS
/'
0.26
...'
0.23
-'
0.22
.'
0.21
.'.
0.21
ÂĢÂĻ
0.20
--
0.19
'.
0.18
—
0.18
..'
0.18
Activations Density 0.085%