INDEX
Explanations
contrasting conjunctions and related terms
Follows the word "and"
New Auto-Interp
Negative Logits
".
-1.39
.
-1.37
).
-1.34
”.
-1.30
'.
-1.28
].
-1.24
。
-1.22
’.
-1.17
}$.
-1.16
}}$.
-1.15
POSITIVE LOGITS
{},
0.82
?”,
0.81
________,
0.80
*/,
0.79
'',
0.79
``,
0.77
?',
0.76
$/,
0.75
`,
0.74
>",
0.74
Activations Density 1.640%