INDEX
Explanations
punctuation marks, specifically parentheses
New Auto-Interp
Negative Logits
{}'.-0.78
ValueStyle
-0.73
<bos>
-0.72
{}'.-0.70
';
-0.69
[];
-0.63
`,
-0.62
<?,
-0.62
question
-0.62
}}</
-0.61
POSITIVE LOGITS
("2.38
(“
2.29
(“
2.24
("2.13
(„
1.87
('1.81
(‘
1.78
(«
1.72
('1.67
(‘
1.63
Activations Density 0.075%