INDEX
Explanations
punctuation marks, particularly periods and quotation marks
New Auto-Interp
Negative Logits
ï¼ļ↵↵
-0.18
ï¼ļ↵
-0.15
:↵↵
-0.15
:↵↵↵
-0.15
:↵↵↵↵
-0.15
['__
-0.15
:");↵
-0.14
:↵↵
-0.14
tek
-0.14
elles
-0.14
POSITIVE LOGITS
Adds
0.27
"And
0.26
"But
0.25
“And
0.22
"
0.22
“But
0.21
Adds
0.21
added
0.20
Added
0.20
Added
0.19
Activations Density 0.107%