INDEX
Explanations
phrases indicating personal opinions or feelings
New Auto-Interp
Negative Logits
“
-1.91
’
-1.81
‘
-1.70
”
-1.67
’,
-1.57
.’
-1.53
’.
-1.47
“
-1.47
.”
-1.46
,’
-1.45
POSITIVE LOGITS
。"
1.42
Efq
1.32
-"
1.32
^(@)
1.30
...'
1.29
...'
1.24
:"
1.22
Jefus
1.22
doubtnut
1.20
--"
1.18
Activations Density 0.719%