INDEX
Explanations
mathematical symbols and notation, particularly in context with equations and parameters
New Auto-Interp
Negative Logits
$.}
-0.75
)$}
-0.66
</tbody>
-0.66
?”.
-0.58
مشين
-0.58
}*/
-0.58
"""
-0.57
}*/
-0.56
."""
-0.54
}$}
-0.54
POSITIVE LOGITS
\\
1.00
\\
0.99
\\[
0.83
)\\
0.80
;\\
0.76
\\[
0.76
'\\
0.75
,\\
0.75
}\\
0.71
]\\
0.71
Activations Density 0.795%