INDEX
Explanations
punctuation marks and quotation marks in the text
New Auto-Interp
Negative Logits
)*/
-1.19
")}
-1.11
')));
-1.09
."));
-1.09
.)}
-1.07
).}
-1.07
"]}
-1.04
')]
-1.04
")));
-1.04
")))
-1.03
POSITIVE LOGITS
‘
0.70
':
0.57
“
0.56
’
0.56
":
0.53
`
0.53
enfans
0.52
`.
0.51
Psychol
0.51
識
0.50
Activations Density 0.204%